[postgis-users] Can we limit the search range of geocode() function in Tiger Geocoder?

Paragon Corporation lr at pcorp.us
Fri Nov 27 21:32:46 PST 2015


Dracodoc,

Nice writeup.  I'll respond on your stack exchange question.

That might be a good enhancement option to focus on for PostGIS 2.3. Haven't
thought of how that filtering would work.
Perhaps another geocode function that takes in a filter of desired states.

Anyrate yah the geometry filter is not really optimized and was more
designed for random polygon areas where you absolutely know an address has
to fall, and it probably needs some performance work since I didn't spend
too much time creating it.

Thanks,
Regina
http://postgis.net
http://www.postgis.us


--- ORIGINA MESSAGE --
Hi all, I asked this question in gis stackexchange here already
<http://gis.stackexchange.com/questions/171817/can-we-limit-the-search-range
-of-geocode-function-in-postgis-tiger-geocoder>.
I'm also posting the question in the mailing list to make sure the experts
can see the question.

I found the server with only 2 states data loaded is much faster than the
server with all states loaded. My theory is bad formatted address that
don't have a exact hit at first will cost much more time when the geocoder
checked all states. With only 2 states this search is limited and stopped
much early.

There is a restrict_region parameter in geocode function looks promising if
it can limit the search range, since I have enough information or reason to
believe the state information in my addresses input are correct.

I wrote a query trying to use one state's geometry as the limiting
parameter:

SELECT geocode('501 Fairmount DR , Annapolis, MD 20137', 1, the_geom)
    FROM tiger.state WHERE statefp = '24';


and compared the performance with the simple version

SELECT geocode('501 Fairmount DR , Annapolis, MD 20137',1);


I didn't find performance gain with the parameter. Instead it lost the
performance gain from caching, which usually came from running same query
immediately again because all the needed data have been cached in RAM.

Maybe my usage is not proper, or this parameter is not intended to work as
I expected.

However if the search range can be limited, the performance gain could be
substantial, since it's the bad formatted addresses took the most time to
geocode, and they also often mess up the already cached data because the
geocoder need to search for states, even all my input are in one state and
all data can be cached in RAM.

Thanks!

By the way, I wrote about my system setup
<http://dracodoc.github.io/2015/11/17/Geocoding/> and work flow
<http://dracodoc.github.io/2015/11/19/Script-workflow/> in my blog. Wish it
can help other novices in geocoding.




More information about the postgis-users mailing list