[postgis-users] Geocoder (from extras)

Johnathan Leppert johnathan.leppert at gmail.com
Sat May 7 06:28:13 PDT 2011


Hi Regina,

I have all 50 states loaded, as well as Puerto Rico. I think it would definitely help to support bounding box style searching, that way I could sort by e.g. state and do one state at a time. It's usually pretty easy (easier) to tease a state out of an address string, at least with the data I am working with.

I am trying to geocode about 20 million addresses and want to incorporate the geocoder in our batch jobs.

I'll try your optimization suggestions with increasing shared memory etc.

Thanks much for your response and help!

Johnathan

On May 7, 2011, at 2:12 AM, "Paragon Corporation" <lr at pcorp.us> wrote:

> I was thinking about incorporating trigrams as well.  I have been impressed
> with their speed and accuracy for some types of checks. Look at our write-up
> 
> http://www.postgresonline.com/journal/archives/169-Fuzzy-string-matching-wit
> h-Trigram-and-Trigraphs.html
> 
> Also improvements are supposed to have been made in PostgreSQL 9.1 about how
> inherited table statistics and plans are constructed which should help this.
> We are itching to try that out.
> 
> I was also toying with the idea of  having an argument for level of effort
> or an extra regional argument -- similar to what I have seen in nominatum.
> So for example if I am geocoding Boston, I really don't need the system
> searching CA.  So If I can pass in a polygon or bounding box region
> (particularly for batch geocodes), it would probably speed things up.
> 
> In most of these cases that take a long time, it's because the address is
> so bad, you won't get a good answer anyway.  So having such a thing would
> speed things up quite a bit.  I have something similar to that customized
> for geocoding I need for parcel level precision.
> 
> Jonathan,
> 
> Off-hand how many states do you have loaded.  For my current project, I've
> really only had a need for one and the speed varies significantly depending
> on how badly spelled the data is we are dealing with and how much memory and
> how powerful our processors are.  So we get speeds anywhere from 60ms to 5
> seconds.
> 
> I think as Steve said trigrams may help a lot in this regard. We can still
> keep the soundex because that often catches things soundex doesn't and vice
> versa.
> 
> I'm not sure how much preplanning will help.  Because plans are cached in
> PostgreSQL for these dynamic like queries and I think they are actually
> cached across sessions using shared memory.   So upping shared memory helps
> quite a bit.
> 
> Thanks,
> Regina and Leo
> http://www.postgis.us
> 
> 
> 
> 
> 
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Stephen
> Frost
> Sent: Friday, May 06, 2011 10:36 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Geocoder (from extras)
> 
> * Johnathan Leppert (johnathan.leppert at gmail.com) wrote:
>> Ok, I have it working, but it's very very slow. It takes about two 
>> seconds to geocode a single record. This isn't very realistic for large
> datasets.
>> Anyone have any ideas on possible optimization? It appears to have 
>> created proper indexes on all the tables.
> 
> Yeah, there's a few things I've been working on.  For one, a lot of the time
> is spent planning that massive query at the heart of the geocoding, when it
> really doesn't need to be re-planned for every query.  It's done the way it
> is currently because the tables are partitioned up by state.
> I've got someone working on simply splitting that query up to be 53 (or
> however many...) queries, one for each state/territory, which will allow
> them to be planned once and then the plans re-used inside the same session.
> That should improve things.
> 
> I've also been looking into using trigrams for things instead of the current
> indexes..  I think they're perform better with regard to speed, just need to
> make sure it still returns good results, etc..
> 
> I'm very interested in anyone else working on this and any suggestions
> people have for how to improve it..  I've not been able to work on it much
> recently and while I've tried to delegate it to others, it's a pretty
> complex system which is, in particular, hard to test well..
> 
>    Thanks,
> 
>        Stephen
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users



More information about the postgis-users mailing list