[postgis-users] Geocoder (from extras)

Paragon Corporation lr at pcorp.us
Fri May 6 23:12:24 PDT 2011


I was thinking about incorporating trigrams as well.  I have been impressed
with their speed and accuracy for some types of checks. Look at our write-up

http://www.postgresonline.com/journal/archives/169-Fuzzy-string-matching-wit
h-Trigram-and-Trigraphs.html

Also improvements are supposed to have been made in PostgreSQL 9.1 about how
inherited table statistics and plans are constructed which should help this.
We are itching to try that out.

I was also toying with the idea of  having an argument for level of effort
or an extra regional argument -- similar to what I have seen in nominatum.
So for example if I am geocoding Boston, I really don't need the system
searching CA.  So If I can pass in a polygon or bounding box region
(particularly for batch geocodes), it would probably speed things up.

 In most of these cases that take a long time, it's because the address is
so bad, you won't get a good answer anyway.  So having such a thing would
speed things up quite a bit.  I have something similar to that customized
for geocoding I need for parcel level precision.

Jonathan,

Off-hand how many states do you have loaded.  For my current project, I've
really only had a need for one and the speed varies significantly depending
on how badly spelled the data is we are dealing with and how much memory and
how powerful our processors are.  So we get speeds anywhere from 60ms to 5
seconds.

I think as Steve said trigrams may help a lot in this regard. We can still
keep the soundex because that often catches things soundex doesn't and vice
versa.

 I'm not sure how much preplanning will help.  Because plans are cached in
PostgreSQL for these dynamic like queries and I think they are actually
cached across sessions using shared memory.   So upping shared memory helps
quite a bit.

Thanks,
Regina and Leo
http://www.postgis.us



  

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Stephen
Frost
Sent: Friday, May 06, 2011 10:36 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Geocoder (from extras)

* Johnathan Leppert (johnathan.leppert at gmail.com) wrote:
> Ok, I have it working, but it's very very slow. It takes about two 
> seconds to geocode a single record. This isn't very realistic for large
datasets.
> Anyone have any ideas on possible optimization? It appears to have 
> created proper indexes on all the tables.

Yeah, there's a few things I've been working on.  For one, a lot of the time
is spent planning that massive query at the heart of the geocoding, when it
really doesn't need to be re-planned for every query.  It's done the way it
is currently because the tables are partitioned up by state.
I've got someone working on simply splitting that query up to be 53 (or
however many...) queries, one for each state/territory, which will allow
them to be planned once and then the plans re-used inside the same session.
That should improve things.

I've also been looking into using trigrams for things instead of the current
indexes..  I think they're perform better with regard to speed, just need to
make sure it still returns good results, etc..

I'm very interested in anyone else working on this and any suggestions
people have for how to improve it..  I've not been able to work on it much
recently and while I've tried to delegate it to others, it's a pretty
complex system which is, in particular, hard to test well..

	Thanks,

		Stephen





More information about the postgis-users mailing list