[postgis-tickets] [PostGIS] #2260: Benchmarking speed between built-in tiger normalizer and pagc_address_parser

PostGIS trac at osgeo.org
Thu Apr 25 04:59:31 PDT 2013


#2260: Benchmarking speed between built-in tiger normalizer and
pagc_address_parser
----------------------------------+-----------------------------------------
  Reporter:  robe                 |       Owner:  robe         
      Type:  task                 |      Status:  closed       
  Priority:  medium               |   Milestone:  PostGIS 2.1.0
 Component:  pagc_address_parser  |     Version:  trunk        
Resolution:  fixed                |    Keywords:               
----------------------------------+-----------------------------------------

Comment(by robe):

 I'm going to keep this closed, but I'm still seeing an issue with
 performance.  I suspect it might be the way I'm wrapping your functions
 rather than anything wrong in your functions.

 I have a table of about 500,000 records of boston residents and I pulled a
 random 100 (well not that random but).

 and if I run this:

 {{{ -- takes 60 ms on hot, 120 ms on cold
 SELECT parse_address(COALESCE(streetno,'') || ' ' ||
 COALESCE(streetname,'') || ' ' || city || ', ' || state || ' ' || zip)
 FROM
 electionstime_2012
 limit 100;

 -- takes 100 ms cold, 40 ms hot --
 -- why your standardize_address seems faster
 --  than parse_address is a mystery as I would expect parse to be faster
 -- I have to say this is after changing your function from set returning
 -- but I recall speed being about the same
 SELECT (std).*
 FROM (SELECT standardize_address('pagc_lex','pagc_gaz', 'pagc_rules',
 COALESCE(streetno,'') || ' ' || COALESCE(streetname,''), city || ', ' ||
 state || ' ' || zip) As std from electionstime_2012
 limit 100) As a;


 -- but my function which granted its doing some stuff
 -- should be spending most of its time parsing and then standardizing
 -- takes a whopping 2714 ms
 SELECT tiger.pagc_normalize_address(COALESCE(streetno,'') || ' ' ||
 COALESCE(streetname,'') || ' ' || city || ', ' || state || ' ' || zip)
 from electionstime_2012
 limit 100;

 -- versus - takes 618 ms
 SELECT tiger.normalize_address(COALESCE(streetno,'') || ' ' ||
 COALESCE(streetname,'') || ' ' || city || ', ' || state || ' ' || zip)
 from electionstime_2012
 limit 100;
 }}}

 So possibly the cost of copying to norm_addy structure I am
 underestimating or something else I am doing wrong.

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2260#comment:30>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list