[postgis-users] Geocoder (from extras)

Stephen Woodbridge woodbri at swoodbridge.com
Sat May 7 14:00:15 PDT 2011


Stephen,

In a geocoder I wrote in 2000, I mapped city, state and/or zipcode into 
a list of counties, then only searched those counties. This greatly 
reduced the 3300 counties in Tiger to a small handful that needed to be 
searched. The FIPS-4 document has a mapping of all place names to 
counties. Zipcodes are easier to handle because you can just index the 
records with zipcodes but if you wanted to widen the search, then map 
the zipcode to a place name and then to counties.

My first thought was to use inherited tables where each state or state 
equivalent is in a separate table and filter the tables by the state 
name of abbreviation. It sounds like you might be already doing 
something like this.

I think trigrams would provide some benefits. I have used double 
metaphone, but only used the first code for fuzzy searches because this 
is a better phonic search key than soundex in my opinion because it 
matches thinks that sound alike even if they do not start with the same 
letter. Since I used it as a key I only used the first 4 characters of 
the metaphone code.

I wrote my own tiger geocoder in C and have since abandoned that in 
favor of PAGC. PAGC has just recently refactored the code to support 
SQLite as a datastore and has had some work done to support Postgresql 
in addition to the previous support for Berkeley DB. One idea that I had 
was that with appropriate support for database backing stores, that the 
PAGC library might be able to be wrapped in a stored procedure which 
would then provide a high quality and high performance geocoder in the 
database.

Here is an instance that I created using SQLite as the backing store and 
Tiger 2010 data:

http://imaptools.com:8080/geocode/

The single line parsing is broken at the moment, I have a ticket to look 
into that and hope to have it resolved shortly.

PAGC project can be found here:

http://www.pagcgeo.org/

Regards,
   -Steve


On 5/6/2011 10:36 PM, Stephen Frost wrote:
> * Johnathan Leppert (johnathan.leppert at gmail.com) wrote:
>> Ok, I have it working, but it's very very slow. It takes about two seconds
>> to geocode a single record. This isn't very realistic for large datasets.
>> Anyone have any ideas on possible optimization? It appears to have created
>> proper indexes on all the tables.
>
> Yeah, there's a few things I've been working on.  For one, a lot of the
> time is spent planning that massive query at the heart of the geocoding,
> when it really doesn't need to be re-planned for every query.  It's done
> the way it is currently because the tables are partitioned up by state.
> I've got someone working on simply splitting that query up to be 53 (or
> however many...) queries, one for each state/territory, which will allow
> them to be planned once and then the plans re-used inside the same
> session.  That should improve things.
>
> I've also been looking into using trigrams for things instead of the
> current indexes..  I think they're perform better with regard to speed,
> just need to make sure it still returns good results, etc..
>
> I'm very interested in anyone else working on this and any suggestions
> people have for how to improve it..  I've not been able to work on it
> much recently and while I've tried to delegate it to others, it's a
> pretty complex system which is, in particular, hard to test well..
>
> 	Thanks,
>
> 		Stephen
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list