[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize

Norman Vine nhv at cape.com
Sat Jul 5 07:55:52 PDT 2014

IMHO This is a good overview with good recommendations about symlinks in SVN	


On Jul 5, 2014, at 9:52 AM, Stephen Woodbridge <woodbri at swoodbridge.com> wrote:

> On 7/5/2014 2:41 AM, Paragon Corporation wrote:
>>>>> 1) Create folder in extensions of our repo and move the
>>>>> address_standardizer
>>>>> extension files to their
>>>>> I'd still like it to be able to be built separately if people wish
>>>>> (similar
>>>>> to how we have liblwgeom I think) and my only reservation with
>>>>> breaking out
>>>>> like this is that it makes it less compact.
>>> I would defer to strk on this, but can we use a symlink to get the file
>>> to appear in two places under svn?
>> Let's stay away from symlinks - they don't behave well under windows.  I
>> think for now we'll
>> Just leave the extension files wher they are and decide what to do later.
>> Perhaps just copy the control files as part of our extension make scripts.
> I'm ok with this. The reason I suggested symlinks (knowing that windows does not handle them) is that I thought I read the SVN handles them internally and when you checkout on windows it makes a copy of the file and keeps information that it is a symlink, so if you make changes to the file and commit them, it is smart enough to apply the changes to the original file and not the copy. I have tried this yet so I may not have understood it correctly.
>> I was the one who started the extension folder primarily because when I was
>> formulating the extension install
>> Process, it was easier not to affect the rest of the code base while I was
>> fleshing things out.
>> That said the only issue is the documentation auto comments generation that
>> I would like address_standardizer to
>> Have similar to the other extensions we have.  As well as the versioning
>> plumming that all postgis extensions share.
> I'm ok with doing whatever is right. There is a lot of details about how postgis works on the development and release processes that I'm not yet familiar with.
>> Let's see how things go keeping things where they are unless someone has a
>> better idea.
> Agreed.
>>>> 4) Build separate extensions for the custom gaz/lex/rules currently
>>>> present
>>>> and add more. Right now to run the packaged dictionaries you need to
>>>> run the
>>>> lex,gaz,rules.sql files which is cumbersome from a newbie stand-point.
>>>> This one I'm actually thinking just rolling the current one in the base
>>>> extension and then having extensions for custom ones. Since at least US
>>>> people will just use the base one or if they are using tiger geocoder the
>>>> tiger geocoder one already packaged with tiger geocoder extension.
>>>   this is where things get muddy ... Like so many software projects, a
>>> broad generalized archtecture ends up covering a
>>> common use case, and the rest is then in the way or collects dust as
>>> focus narrows. It *is* great to have a generalized address parsing
>>> engine.. but how this lib got here is,
>>> its been difficult to modernize and put sufficient time into a small
>>> niche utility - Steve told me so..
>>> A "pragmatic" move would be to tightly configure the lex/gaz/etc to the
>>> TIGER Geocoder
>>> and ship it.. but, not using the capacity of the lib. On the other hand,
>>> if the generalized,
>>> multinational promise is pursued, who is going to build it out? Where
>>> are the OSM people ?
>>> I am interested sure but this is dense going.. Steve and Regina but are
>>> there enough hands ?
>>> no clear answers here...
>>> OK, I have thoughts on this, along the following lines which have to do
>>> with the longer term. I think we should have multiple packages for the
>>> set of gaz/lex/rules, that can be used for different data sets.
>> Agree
>>> I would keep data you have for the
>> Tiger Geocoder with that application and keep the generic files that
>> came with the address standardizer as a more generic set that other
>> people can use to make custom changes to.
>> So question then is do we just include a sample as part of the extension, or
>> we create an example
>> Extension that demonstrates the concept of the files.
>> I'm thinking just packaging it along with the main will be easier, but maybe
>> call the tables
>> sample_lex, sample_gaz, sample_rules
>> Or something so pepole know these get overwritten if they upgrade and they
>> should build their own.
>> It will also make writing the examples in doco easier if we have sample
>> tables people can reference to see how it works.
> I don't like the idea of including them with as part of the address standardizer extension for the following reasons:
> 1. calling them sample_lex, sample_gaz, sample_rules just loads junk that can not easily be removed because it is bundled with the extension
> 2. we might have multiple lex, gaz, rules for different countries/data sets like, Tiger, Canada, UK, France, Germany, etc or for other use cases.
> Would it be ok if we created an address_standardizer_sample_data extension that loaded sample_lex, sample_gaz, sample_rules? this would make sense for documentation and testing. And we could extend that to create additional packages in the future if we decide to do that.
> I would bundle the Tiger Geocoder files with that extension.
> The rationale here is that files that are part of an application should get loaded with that application, but if I'm loading just the address standardizer then I should choose which data files I want to load if any because I am likely building my own application and may be loading my own versions of the data files.
>>>> My personal take is -- change the TIGER Geocoder for 2.2+ and break
>>>> compatability ..
>>>> Whatever is convenient.. very much unlike the overall PostGIS project,
>>>> there are
>>>> few if any  'production systems' depending on the details, and damn them
>>>> anyway if they whine
>>> I don't have a strong opinion on this one, but my take would be to leave
>>> the current setup as is. We have talked about a total rewrite of the
>>> Tiger Geocoder to make it more generic and to follow more of the ideas
>>> that I have put into my geocoder. This would be the place to make
>>> breaking changes. My current geocoder uses 95+% of its code and I can
>>> load Tiger, Navteq, or Canada data into it. Longer term I would like to
>>> extend this to be able to load Navteq or TeleAtlas or other data for
>>> Western Europe, but we need to make some changes to address standardizer
>>> and parser to handle accents and parse input in non-English countries.
>>> This would give us a Geocoder capability that would be on par with
>>> Oracle Spatial's Geocoder.
>> I like that idea better - start with a clean slate then I won't be tempted
>> to salvage anything that shouldn't be salvaged.
> I originally looked the the Tiger Geocoder but it is too tied to legacy Tiger structure. After dealing with the data loading to a more abstract table structure that could be used with address standardizer it only took a week to implement the core geocoding engine. So a clean slate rewrite could greatly simplify the code and make it much easier to support and enhance. I'll try to start a white paper that discuss how I did this and that might be a good starting point for discussing the design and rewrite for a future release.
> Lots of good ideas!
> Thanks,
> -Steve
>> Thanks,
>> Regina
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel

More information about the postgis-devel mailing list