[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize

Paragon Corporation lr at pcorp.us
Fri Jul 4 23:41:27 PDT 2014


>>> 1) Create folder in extensions of our repo and move the
>>> address_standardizer
>>> extension files to their
>>> I'd still like it to be able to be built separately if people wish
>>> (similar
>>> to how we have liblwgeom I think) and my only reservation with
>>> breaking out
>>> like this is that it makes it less compact.

> I would defer to strk on this, but can we use a symlink to get the file 
> to appear in two places under svn?

Let's stay away from symlinks - they don't behave well under windows.  I
think for now we'll
Just leave the extension files wher they are and decide what to do later.
Perhaps just copy the control files as part of our extension make scripts.

I was the one who started the extension folder primarily because when I was
formulating the extension install
Process, it was easier not to affect the rest of the code base while I was
fleshing things out.

That said the only issue is the documentation auto comments generation that
I would like address_standardizer to 
Have similar to the other extensions we have.  As well as the versioning
plumming that all postgis extensions share.

Let's see how things go keeping things where they are unless someone has a
better idea.

>> 4) Build separate extensions for the custom gaz/lex/rules currently
>> present
>> and add more. Right now to run the packaged dictionaries you need to
>> run the
>> lex,gaz,rules.sql files which is cumbersome from a newbie stand-point.
>> This one I'm actually thinking just rolling the current one in the base
>> extension and then having extensions for custom ones. Since at least US
>> people will just use the base one or if they are using tiger geocoder the
>> tiger geocoder one already packaged with tiger geocoder extension.
>    this is where things get muddy ... Like so many software projects, a
> broad generalized archtecture ends up covering a
> common use case, and the rest is then in the way or collects dust as
> focus narrows. It *is* great to have a generalized address parsing
> engine.. but how this lib got here is,
> its been difficult to modernize and put sufficient time into a small
> niche utility - Steve told me so..
> A "pragmatic" move would be to tightly configure the lex/gaz/etc to the
> TIGER Geocoder
> and ship it.. but, not using the capacity of the lib. On the other hand,
> if the generalized,
> multinational promise is pursued, who is going to build it out? Where
> are the OSM people ?
> I am interested sure but this is dense going.. Steve and Regina but are
> there enough hands ?
> no clear answers here...

> OK, I have thoughts on this, along the following lines which have to do 
> with the longer term. I think we should have multiple packages for the 
> set of gaz/lex/rules, that can be used for different data sets. 

>  I would keep data you have for the 
Tiger Geocoder with that application and keep the generic files that 
came with the address standardizer as a more generic set that other 
people can use to make custom changes to.

So question then is do we just include a sample as part of the extension, or
we create an example
Extension that demonstrates the concept of the files.

I'm thinking just packaging it along with the main will be easier, but maybe
call the tables

sample_lex, sample_gaz, sample_rules

Or something so pepole know these get overwritten if they upgrade and they
should build their own.
It will also make writing the examples in doco easier if we have sample
tables people can reference to see how it works.

>> My personal take is -- change the TIGER Geocoder for 2.2+ and break
>> compatability ..
>> Whatever is convenient.. very much unlike the overall PostGIS project,
>> there are
>> few if any  'production systems' depending on the details, and damn them
>> anyway if they whine

> I don't have a strong opinion on this one, but my take would be to leave 
> the current setup as is. We have talked about a total rewrite of the 
> Tiger Geocoder to make it more generic and to follow more of the ideas 
> that I have put into my geocoder. This would be the place to make 
> breaking changes. My current geocoder uses 95+% of its code and I can 
> load Tiger, Navteq, or Canada data into it. Longer term I would like to 
> extend this to be able to load Navteq or TeleAtlas or other data for 
> Western Europe, but we need to make some changes to address standardizer 
> and parser to handle accents and parse input in non-English countries. 
> This would give us a Geocoder capability that would be on par with 
> Oracle Spatial's Geocoder.

I like that idea better - start with a clean slate then I won't be tempted
to salvage anything that shouldn't be salvaged.


More information about the postgis-devel mailing list