[postgis-tickets] [PostGIS] #2260: Benchmarking speed between built-in tiger normalizer and pagc_address_parser
PostGIS
trac at osgeo.org
Thu Apr 25 04:59:31 PDT 2013
#2260: Benchmarking speed between built-in tiger normalizer and
pagc_address_parser
----------------------------------+-----------------------------------------
Reporter: robe | Owner: robe
Type: task | Status: closed
Priority: medium | Milestone: PostGIS 2.1.0
Component: pagc_address_parser | Version: trunk
Resolution: fixed | Keywords:
----------------------------------+-----------------------------------------
Comment(by robe):
I'm going to keep this closed, but I'm still seeing an issue with
performance. I suspect it might be the way I'm wrapping your functions
rather than anything wrong in your functions.
I have a table of about 500,000 records of boston residents and I pulled a
random 100 (well not that random but).
and if I run this:
{{{ -- takes 60 ms on hot, 120 ms on cold
SELECT parse_address(COALESCE(streetno,'') || ' ' ||
COALESCE(streetname,'') || ' ' || city || ', ' || state || ' ' || zip)
FROM
electionstime_2012
limit 100;
-- takes 100 ms cold, 40 ms hot --
-- why your standardize_address seems faster
-- than parse_address is a mystery as I would expect parse to be faster
-- I have to say this is after changing your function from set returning
-- but I recall speed being about the same
SELECT (std).*
FROM (SELECT standardize_address('pagc_lex','pagc_gaz', 'pagc_rules',
COALESCE(streetno,'') || ' ' || COALESCE(streetname,''), city || ', ' ||
state || ' ' || zip) As std from electionstime_2012
limit 100) As a;
-- but my function which granted its doing some stuff
-- should be spending most of its time parsing and then standardizing
-- takes a whopping 2714 ms
SELECT tiger.pagc_normalize_address(COALESCE(streetno,'') || ' ' ||
COALESCE(streetname,'') || ' ' || city || ', ' || state || ' ' || zip)
from electionstime_2012
limit 100;
-- versus - takes 618 ms
SELECT tiger.normalize_address(COALESCE(streetno,'') || ' ' ||
COALESCE(streetname,'') || ' ' || city || ', ' || state || ' ' || zip)
from electionstime_2012
limit 100;
}}}
So possibly the cost of copying to norm_addy structure I am
underestimating or something else I am doing wrong.
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2260#comment:30>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list