[postgis-devel] [PostGIS] #1382: Some addresses take a long time to geocode or normalize

PostGIS trac at osgeo.org
Sat Dec 17 08:43:41 PST 2011


#1382: Some addresses take a long time to geocode or normalize
----------------------------+-----------------------------------------------
 Reporter:  raviada         |       Owner:  robe         
     Type:  defect          |      Status:  assigned     
 Priority:  medium          |   Milestone:  PostGIS 2.0.0
Component:  tiger geocoder  |     Version:  trunk        
 Keywords:                  |  
----------------------------+-----------------------------------------------

Comment(by robe):

 Ravi,
 Thanks for the examples.  I'm still analyzing these.
 The ones I've tested the normalize_address functions on returned fairly
 fast.  All under 7ms.  So I'm not seeing a normalize speed issue here.  If
 you are you might be running an older version of the geocoder. In the
 normalize_address code you should see this:

 {{{
 normalize_address.sql 8252 2011-11-29 08:49:06Z robe
 }}}

 That is the revision number of the latest normalizer.  If you don't have
 it at all or an older number yours is out of date.

  Here is what I think is wrong with some of them:

 1) 179TH ST, ADDISLEIGH PARK, NY 114341413, I think I can improve on the
 normalizing logic and I might have a ticket for it already.

 is just normalizing incorrectly so its putting 179 in the street number
 and ST as the street name.  This is because this address has no street
 number.  So that is why that one is so slow and comes back with wrong
 answer.  It's still a bit slow on my test box took about 15 secs  for 2
 reasons

 When I do this:

 {{{
 select pprint_addy(addy), rating, ST_AsText(geomout)
  from geocode('0 179TH ST, ADDISLEIGH PARK, NY 114341413',1);
 -- which returns this --
 0 179th St, New York, NY 11434  11      POINT(-73.7666465 40.663195)
 }}}

 a) the street name is short
 b) I didn't specify a valid street range
 b) and ADDISLEIGH PARK doesn't match anything in tiger.

 2) This one 509 METTACAHONTS ROAD, ACCORD, NY 12404  -- runs fairly fast
 on my box
 take 90ms to geocode returning


 {{{
 SELECT pprint_addy(addy), rating, ST_AsText(geomout)
  from geocode('509 METTACAHONTS ROAD, ACCORD, NY 12404', 1);

 23 Mettacahonts Rd, Accord, NY 12404    10      POINT(-74.2487999333536
 41.7948737121302)

 -- this took 78 ms but probably faster because of caching effects
 select pprint_addy(addy), rating, ST_AsText(geomout)
  from geocode('509 METTACAHONTS ROAD, ACCORD, NY 12404');

 108 Mettacahonts Rd, Accord, NY 12404   9       POINT(-74.2469427078796
 41.795613863851)
 }}}

 So the speeds are pretty decent though the address doesn't match.  I
 suspect this is more of a tiger data issue than logic issue.  The fact it
 gives different addresses between limit 1 and none is that to improve
 speed I have inner limit limitting as well but if there is no perfect
 match or close to perfect match you run the issue of the  gvie me one
 answer returning slightly worse than the full.  I'm not sure there is much
 I can do about that without compromising speed and the benefit is low.

 3) This one select pprint_addy(addy), rating, ST_AsText(geomout)
  from geocode('17330 113TH AVE, ADDISLEIGH PARK, NY 114334003',1);

 Did take 20,483 ms and came back with only the street.

 The reason is because the address for this is really: 173-30

 and our geocode doesn't support that kind of street number yet. It would
 require the same structural changes as #886.  I'll see what I can do about
 it though as a lot of NY addresses will have this issue.  But it wouldn't
 help you much since you don't have the - in your address.

 -- Note to Steve Woodbridge:  Would your C normalizer help in this case?
 If we were to embed it in?

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1382#comment:2>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-devel mailing list