[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize

Paragon Corporation lr at pcorp.us
Wed Jul 2 22:25:17 PDT 2014


Thanks for the feedback.

 > details aside, being able to build this seperately     BIG +1 !
> I have definitely used this since Denver 2011 (where I attended the PAGC
talk and met various principals ) The Address Standardizer by itself,
without any TIGER geocoder, is quite valuable. 
> I do appreciate the effort in this library and have said so to Steve in
the past

Agreed and given that a lot of people building geocoders in PostgreSQL are
using PostGIS it seems like an appropriate one stop shop.

>> 2) Beef up the documentation -- right now all we have is how to 
>> install it in our install section of manual (and that of course needs 
> to be update with new link now that its part of our repo) 
>> http://postgis.net/docs/manual-dev/postgis_installation.html#installin
>> g_pagc _address_standardizer So I'm going to add an additional .xml 
>> (separate from tiger and install, explaining all the nuances of the 
>> lexer / rule/ parser files)

>  well, this depends a lot on how the decomposition of libs turns out as
referenced in the next sections below

I think for starters we'll try to keep it as self-contained as possible
except for the installation which is already along side installation of
other PostGIS pieces.
All installation stuff I think should stick together.

>  no matter what happens, PCRE and perl Regexp::Assemble are definitely
required for this
Good point. Forgot about Regexp::Assemble, but luckily that's only required
for building not shipping and given our PostGIS tool chain requires Perl
anyway to build, doesn't seem like too
Much of a requirement for package maintainers to have that.

>> 4) Build separate extensions for the custom gaz/lex/rules currently 
>> present and add more. Right now to run the packaged dictionaries you 
>> need to run the lex,gaz,rules.sql files which is cumbersome from a newbie
>> This one I'm actually thinking just rolling the current one in the 
>> base extension and then having extensions for custom ones. Since at 
>> least US people will just use the base one or if they are using tiger 
>> geocoder the tiger geocoder one already packaged with tiger geocoder

>  this is where things get muddy ... 
> Like so many software projects, a broad generalized archtecture ends up
covering a common use case, and the rest is then in the way or collects dust
as focus narrows. 
> It *is* great to have a generalized address parsing engine.. but how this
lib got here is, its been difficult to modernize and put sufficient time
into a small niche utility - Steve told me so.. 

> A "pragmatic" move would be to tightly configure the lex/gaz/etc to the
TIGER Geocoder and ship it.. 
Already done actually tiger geocoder uses a variant of the one Steve
developed because our requirements and how we standardize things is a bit
different.  I don't think this will change.

> but, not using the capacity of the lib. On the other hand, if the
generalized, multinational promise is pursued, who is going to build it out?
Where are the OSM people ?
> I am interested sure but this is dense going.. Steve and Regina but are
there enough hands ?
> no clear answers here... 
There are never enough hands but I think since set of rules/lex/gaz if we
make each their own extension its very workable and extendable and easy to
divi up for people working in similar geographic regions.

> 5) this one I'm still thinking about because it'll be a major breaking 
> change -- and that would be just to have current tiger geocoder 
> require address_standardizer and swap out the norm_addy object with 
> the address_standardize std_address one. But that requires a bit of 
> rework and assurance that package maintainers can build 
> address_standardizer without too much fuss.

> The TIGER Geocoder treads the line between super-useful and

> Shoot the messanger if you like, but at least I have to umph to say it.. I
am constantly amazed at robe2's relentless productivity and the TIGER
Geocoder project is a fruit of that, warts and all.. I use it and I have
written about it.. 

Agree -- it was like that when I got there but I admittedly did not help
much unspaghetting it, my funded focus was making it useable for windows
users.  That said I think a good chunk of that was the normalizer so if that
swapped out, and replaced with more of a rule based system it’s a lot more

Then all you have to put up with is my insane spaghetti geocoder queries.

> My personal take is -- change the TIGER Geocoder for 2.2+ and break
compatability .. Whatever is convenient.. very much unlike the overall
PostGIS project, there are few if any  'production systems' depending on the
> and damn them anyway if they whine
Hmm have to think about that one a bit still.


More information about the postgis-devel mailing list