[postgis-users] RE: geocoding
David Bitner
osgis.lists at gmail.com
Mon May 23 07:14:03 PDT 2005
I meant to clean up my PL/PGSQL / PL/Perl geocoder and write some
documentation a while ago, but since I've gotten swamped at work and
haven't had the time. In the shape it's in write now it works great,
but I'm still tweaking it every time I get anomalies (ie north north
street or a street with even addresses on both sides) it's pretty darn
forgiving right now and very fast. The problem right now is it's in
chicken scratch code that's been cobbled specifically around one
dataset and modified over the course of months. I'm still pretty
swamped, but perhaps I'll try to put together the code and some
documentation before the OSGIS/Mapserver users meeting. Again, once
in the database, the options are really endless and the development of
a geocoder is a great exercise for both working with the database
itself as well as using some basic postgis functionality. The code I
have will never be "plug and play" since it requires dataset specific
mappings of street type (and directions and states, both those are
more likely consistent) with regular expressions that match allowable
representations -- st can be represented by st|str|street or
str?(eet)?. I store all this information in separate tables that are
used by my address parser. I believe I posted some pseudo code for my
approach on this list a while ago.
Thanks for the hint on the indices on functions. I knew about all of
that, but the little birdie in my head kept saying that it would still
have to use the soundex for every row (I actually created a function
that uses metaphone, but leaves any numbers in place so I can use it
for things such as 22nd). That can simplify some of the data prep
that would be necessary for each new dataset (incidently, the next
tweak to my geocoder will be to allow multiple dataset geocoding with
parcel and street datasets, if it can't find the parcel, look for the
road).
On 5/20/05, bill at binko.net <bill at binko.net> wrote:
> > I'm probably the person that you saw in threads working on a plpgsql
> > geocoder. Ralph is right, it is much more of a text parsing problem
> > than a spatial problem. PL/PGSQL and PL/Perl (great for splitting the
> > text into component parts with regular expressions -- ie
> > streetnumber, streetname, street prefix direction etc.) together with
> > good indexes and precalculated fields (like metaphones of the
> > streetnames) can make a very effective text parser and matcher just
> > utilizing what's good about the PostgreSQL database itself. Throwing
> > the line_interpolate_point or centroid (if a polygon based geocoder --
> > from parcels or zipcodes etc.) from PostGIS into the mix is just
> > iceing on the cake.
>
> David,
>
> What is the status on this? I would also like to replace my geocoder.us
> based geocoder (which I have to shell out to) with something in my
> database.
>
> Also, for performance reasons, remember that PostgreSQL has the ability to
> index on things not in columns. So you can do stuff like:
>
> create index road_sdx on roads (soundex(roadname));
>
> That doesn't create a column for the soundex, but does calculate it on
> each update/insert and keeps it in the index (only). It's like have a
> column that only lives in the index. You can also do stuff like:
>
> create index road_sdx_lookup on roads(soundex(roadname), roadname, road_id);
>
> If you do it in that order, queries like this will be very fast since they
> never have to read the row from the table at all:
>
> select roadname, road_id from roads where soundex(roadname) = 'B520';
>
> Just some off-topic database --sorry for the digression.
>
> Bill
>
More information about the postgis-users
mailing list