[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize

Stephen Woodbridge woodbri at swoodbridge.com
Sat Jul 5 08:54:34 PDT 2014


Thanks Norman,

I reread the svn redbook page and your stackoverflow link and agree we 
should not use symlinks in svn.

-Steve

On 7/5/2014 10:55 AM, Norman Vine wrote:
> IMHO This is a good overview with good recommendations about symlinks in SVN	
>
> http://stackoverflow.com/questions/4056092/what-happens-if-i-add-a-symbolic-link-to-subversion
>
>
> On Jul 5, 2014, at 9:52 AM, Stephen Woodbridge <woodbri at swoodbridge.com> wrote:
>
>> On 7/5/2014 2:41 AM, Paragon Corporation wrote:
>>>
>>>
>>>
>>>
>>>>>>
>>>>>> 1) Create folder in extensions of our repo and move the
>>>>>> address_standardizer
>>>>>> extension files to their
>>>>>> I'd still like it to be able to be built separately if people wish
>>>>>> (similar
>>>>>> to how we have liblwgeom I think) and my only reservation with
>>>>>> breaking out
>>>>>> like this is that it makes it less compact.
>>>
>>>
>>>> I would defer to strk on this, but can we use a symlink to get the file
>>>> to appear in two places under svn?
>>>
>>> Let's stay away from symlinks - they don't behave well under windows.  I
>>> think for now we'll
>>> Just leave the extension files wher they are and decide what to do later.
>>> Perhaps just copy the control files as part of our extension make scripts.
>>
>> I'm ok with this. The reason I suggested symlinks (knowing that windows does not handle them) is that I thought I read the SVN handles them internally and when you checkout on windows it makes a copy of the file and keeps information that it is a symlink, so if you make changes to the file and commit them, it is smart enough to apply the changes to the original file and not the copy. I have tried this yet so I may not have understood it correctly.
>>
>>> I was the one who started the extension folder primarily because when I was
>>> formulating the extension install
>>> Process, it was easier not to affect the rest of the code base while I was
>>> fleshing things out.
>>>
>>> That said the only issue is the documentation auto comments generation that
>>> I would like address_standardizer to
>>> Have similar to the other extensions we have.  As well as the versioning
>>> plumming that all postgis extensions share.
>>
>> I'm ok with doing whatever is right. There is a lot of details about how postgis works on the development and release processes that I'm not yet familiar with.
>>
>>> Let's see how things go keeping things where they are unless someone has a
>>> better idea.
>>
>> Agreed.
>>
>>>>
>>>>>
>>>>> 4) Build separate extensions for the custom gaz/lex/rules currently
>>>>> present
>>>>> and add more. Right now to run the packaged dictionaries you need to
>>>>> run the
>>>>> lex,gaz,rules.sql files which is cumbersome from a newbie stand-point.
>>>>> This one I'm actually thinking just rolling the current one in the base
>>>>> extension and then having extensions for custom ones. Since at least US
>>>>> people will just use the base one or if they are using tiger geocoder the
>>>>> tiger geocoder one already packaged with tiger geocoder extension.
>>>>
>>>>    this is where things get muddy ... Like so many software projects, a
>>>> broad generalized archtecture ends up covering a
>>>> common use case, and the rest is then in the way or collects dust as
>>>> focus narrows. It *is* great to have a generalized address parsing
>>>> engine.. but how this lib got here is,
>>>> its been difficult to modernize and put sufficient time into a small
>>>> niche utility - Steve told me so..
>>>> A "pragmatic" move would be to tightly configure the lex/gaz/etc to the
>>>> TIGER Geocoder
>>>> and ship it.. but, not using the capacity of the lib. On the other hand,
>>>> if the generalized,
>>>> multinational promise is pursued, who is going to build it out? Where
>>>> are the OSM people ?
>>>> I am interested sure but this is dense going.. Steve and Regina but are
>>>> there enough hands ?
>>>> no clear answers here...
>>>
>>>> OK, I have thoughts on this, along the following lines which have to do
>>>> with the longer term. I think we should have multiple packages for the
>>>> set of gaz/lex/rules, that can be used for different data sets.
>>> Agree
>>>
>>>> I would keep data you have for the
>>> Tiger Geocoder with that application and keep the generic files that
>>> came with the address standardizer as a more generic set that other
>>> people can use to make custom changes to.
>>>
>>> So question then is do we just include a sample as part of the extension, or
>>> we create an example
>>> Extension that demonstrates the concept of the files.
>>>
>>> I'm thinking just packaging it along with the main will be easier, but maybe
>>> call the tables
>>>
>>> sample_lex, sample_gaz, sample_rules
>>>
>>>
>>> Or something so pepole know these get overwritten if they upgrade and they
>>> should build their own.
>>> It will also make writing the examples in doco easier if we have sample
>>> tables people can reference to see how it works.
>>
>> I don't like the idea of including them with as part of the address standardizer extension for the following reasons:
>>
>> 1. calling them sample_lex, sample_gaz, sample_rules just loads junk that can not easily be removed because it is bundled with the extension
>>
>> 2. we might have multiple lex, gaz, rules for different countries/data sets like, Tiger, Canada, UK, France, Germany, etc or for other use cases.
>>
>> Would it be ok if we created an address_standardizer_sample_data extension that loaded sample_lex, sample_gaz, sample_rules? this would make sense for documentation and testing. And we could extend that to create additional packages in the future if we decide to do that.
>>
>> I would bundle the Tiger Geocoder files with that extension.
>>
>> The rationale here is that files that are part of an application should get loaded with that application, but if I'm loading just the address standardizer then I should choose which data files I want to load if any because I am likely building my own application and may be loading my own versions of the data files.
>>
>>>>> My personal take is -- change the TIGER Geocoder for 2.2+ and break
>>>>> compatability ..
>>>>> Whatever is convenient.. very much unlike the overall PostGIS project,
>>>>> there are
>>>>> few if any  'production systems' depending on the details, and damn them
>>>>> anyway if they whine
>>>>
>>>
>>>> I don't have a strong opinion on this one, but my take would be to leave
>>>> the current setup as is. We have talked about a total rewrite of the
>>>> Tiger Geocoder to make it more generic and to follow more of the ideas
>>>> that I have put into my geocoder. This would be the place to make
>>>> breaking changes. My current geocoder uses 95+% of its code and I can
>>>> load Tiger, Navteq, or Canada data into it. Longer term I would like to
>>>> extend this to be able to load Navteq or TeleAtlas or other data for
>>>> Western Europe, but we need to make some changes to address standardizer
>>>> and parser to handle accents and parse input in non-English countries.
>>>> This would give us a Geocoder capability that would be on par with
>>>> Oracle Spatial's Geocoder.
>>>
>>> I like that idea better - start with a clean slate then I won't be tempted
>>> to salvage anything that shouldn't be salvaged.
>>
>> I originally looked the the Tiger Geocoder but it is too tied to legacy Tiger structure. After dealing with the data loading to a more abstract table structure that could be used with address standardizer it only took a week to implement the core geocoding engine. So a clean slate rewrite could greatly simplify the code and make it much easier to support and enhance. I'll try to start a white paper that discuss how I did this and that might be a good starting point for discussing the design and rewrite for a future release.
>>
>> Lots of good ideas!
>>
>> Thanks,
>> -Steve
>>
>>> Thanks,
>>> Regina
>>>
>>>
>>>
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at lists.osgeo.org
>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>




More information about the postgis-devel mailing list