[postgis-devel] [PostGIS] #1382: Some addresses take a long time to geocode or normalize
PostGIS
trac at osgeo.org
Sat Dec 17 08:43:41 PST 2011
#1382: Some addresses take a long time to geocode or normalize
----------------------------+-----------------------------------------------
Reporter: raviada | Owner: robe
Type: defect | Status: assigned
Priority: medium | Milestone: PostGIS 2.0.0
Component: tiger geocoder | Version: trunk
Keywords: |
----------------------------+-----------------------------------------------
Comment(by robe):
Ravi,
Thanks for the examples. I'm still analyzing these.
The ones I've tested the normalize_address functions on returned fairly
fast. All under 7ms. So I'm not seeing a normalize speed issue here. If
you are you might be running an older version of the geocoder. In the
normalize_address code you should see this:
{{{
normalize_address.sql 8252 2011-11-29 08:49:06Z robe
}}}
That is the revision number of the latest normalizer. If you don't have
it at all or an older number yours is out of date.
Here is what I think is wrong with some of them:
1) 179TH ST, ADDISLEIGH PARK, NY 114341413, I think I can improve on the
normalizing logic and I might have a ticket for it already.
is just normalizing incorrectly so its putting 179 in the street number
and ST as the street name. This is because this address has no street
number. So that is why that one is so slow and comes back with wrong
answer. It's still a bit slow on my test box took about 15 secs for 2
reasons
When I do this:
{{{
select pprint_addy(addy), rating, ST_AsText(geomout)
from geocode('0 179TH ST, ADDISLEIGH PARK, NY 114341413',1);
-- which returns this --
0 179th St, New York, NY 11434 11 POINT(-73.7666465 40.663195)
}}}
a) the street name is short
b) I didn't specify a valid street range
b) and ADDISLEIGH PARK doesn't match anything in tiger.
2) This one 509 METTACAHONTS ROAD, ACCORD, NY 12404 -- runs fairly fast
on my box
take 90ms to geocode returning
{{{
SELECT pprint_addy(addy), rating, ST_AsText(geomout)
from geocode('509 METTACAHONTS ROAD, ACCORD, NY 12404', 1);
23 Mettacahonts Rd, Accord, NY 12404 10 POINT(-74.2487999333536
41.7948737121302)
-- this took 78 ms but probably faster because of caching effects
select pprint_addy(addy), rating, ST_AsText(geomout)
from geocode('509 METTACAHONTS ROAD, ACCORD, NY 12404');
108 Mettacahonts Rd, Accord, NY 12404 9 POINT(-74.2469427078796
41.795613863851)
}}}
So the speeds are pretty decent though the address doesn't match. I
suspect this is more of a tiger data issue than logic issue. The fact it
gives different addresses between limit 1 and none is that to improve
speed I have inner limit limitting as well but if there is no perfect
match or close to perfect match you run the issue of the gvie me one
answer returning slightly worse than the full. I'm not sure there is much
I can do about that without compromising speed and the benefit is low.
3) This one select pprint_addy(addy), rating, ST_AsText(geomout)
from geocode('17330 113TH AVE, ADDISLEIGH PARK, NY 114334003',1);
Did take 20,483 ms and came back with only the street.
The reason is because the address for this is really: 173-30
and our geocode doesn't support that kind of street number yet. It would
require the same structural changes as #886. I'll see what I can do about
it though as a lot of NY addresses will have this issue. But it wouldn't
help you much since you don't have the - in your address.
-- Note to Steve Woodbridge: Would your C normalizer help in this case?
If we were to embed it in?
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1382#comment:2>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-devel
mailing list