[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize
Paragon Corporation
lr at pcorp.us
Wed Jul 2 22:25:17 PDT 2014
Brian,
Thanks for the feedback.
> details aside, being able to build this seperately BIG +1 !
> I have definitely used this since Denver 2011 (where I attended the PAGC
talk and met various principals ) The Address Standardizer by itself,
without any TIGER geocoder, is quite valuable.
> I do appreciate the effort in this library and have said so to Steve in
the past
Agreed and given that a lot of people building geocoders in PostgreSQL are
using PostGIS it seems like an appropriate one stop shop.
>> 2) Beef up the documentation -- right now all we have is how to
>> install it in our install section of manual (and that of course needs
> to be update with new link now that its part of our repo)
>> http://postgis.net/docs/manual-dev/postgis_installation.html#installin
>> g_pagc _address_standardizer So I'm going to add an additional .xml
>> (separate from tiger and install, explaining all the nuances of the
>> lexer / rule/ parser files)
>>
> well, this depends a lot on how the decomposition of libs turns out as
referenced in the next sections below
I think for starters we'll try to keep it as self-contained as possible
except for the installation which is already along side installation of
other PostGIS pieces.
All installation stuff I think should stick together.
> no matter what happens, PCRE and perl Regexp::Assemble are definitely
required for this
Good point. Forgot about Regexp::Assemble, but luckily that's only required
for building not shipping and given our PostGIS tool chain requires Perl
anyway to build, doesn't seem like too
Much of a requirement for package maintainers to have that.
>
>> 4) Build separate extensions for the custom gaz/lex/rules currently
>> present and add more. Right now to run the packaged dictionaries you
>> need to run the lex,gaz,rules.sql files which is cumbersome from a newbie
stand-point.
>>
>> This one I'm actually thinking just rolling the current one in the
>> base extension and then having extensions for custom ones. Since at
>> least US people will just use the base one or if they are using tiger
>> geocoder the tiger geocoder one already packaged with tiger geocoder
extension.
>>
> this is where things get muddy ...
> Like so many software projects, a broad generalized archtecture ends up
covering a common use case, and the rest is then in the way or collects dust
as focus narrows.
> It *is* great to have a generalized address parsing engine.. but how this
lib got here is, its been difficult to modernize and put sufficient time
into a small niche utility - Steve told me so..
> A "pragmatic" move would be to tightly configure the lex/gaz/etc to the
TIGER Geocoder and ship it..
Already done actually tiger geocoder uses a variant of the one Steve
developed because our requirements and how we standardize things is a bit
different. I don't think this will change.
> but, not using the capacity of the lib. On the other hand, if the
generalized, multinational promise is pursued, who is going to build it out?
Where are the OSM people ?
> I am interested sure but this is dense going.. Steve and Regina but are
there enough hands ?
> no clear answers here...
There are never enough hands but I think since set of rules/lex/gaz if we
make each their own extension its very workable and extendable and easy to
divi up for people working in similar geographic regions.
>
> 5) this one I'm still thinking about because it'll be a major breaking
> change -- and that would be just to have current tiger geocoder
> require address_standardizer and swap out the norm_addy object with
> the address_standardize std_address one. But that requires a bit of
> rework and assurance that package maintainers can build
> address_standardizer without too much fuss.
>
> The TIGER Geocoder treads the line between super-useful and
super-spaghetti.
> Shoot the messanger if you like, but at least I have to umph to say it.. I
am constantly amazed at robe2's relentless productivity and the TIGER
Geocoder project is a fruit of that, warts and all.. I use it and I have
written about it..
Agree -- it was like that when I got there but I admittedly did not help
much unspaghetting it, my funded focus was making it useable for windows
users. That said I think a good chunk of that was the normalizer so if that
swapped out, and replaced with more of a rule based system its a lot more
maintainable.
Then all you have to put up with is my insane spaghetti geocoder queries.
> My personal take is -- change the TIGER Geocoder for 2.2+ and break
compatability .. Whatever is convenient.. very much unlike the overall
PostGIS project, there are few if any 'production systems' depending on the
details,
> and damn them anyway if they whine
Hmm have to think about that one a bit still.
Thanks,
Regina
More information about the postgis-devel
mailing list