[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize

Stephen Woodbridge woodbri at swoodbridge.com
Wed Jul 2 22:02:45 PDT 2014

I'll try to respond to both your comments here. First, thank you for 
taking the initiative to get this migration started.

On 7/2/2014 11:22 PM, maplabs at light42.com wrote:
> On Wed, 2 Jul 2014 22:42:48 -0400, Paragon Corporation <lr at pcorp.us> wrote:
> I just forked the PAGC address standardizer into PostGIS trunk for release
>> as part of PostGIS 2.2
>    svn checkout  just now did get code, so thats good  :-)

Looks like I have a problem during commits:

[] ~/work/postgis/trunk/extras/address_standardizer$ svn info
Path: .
URL: https://woodbri@svn.osgeo.org/postgis/trunk/extras/address_standardizer
Repository Root: https://woodbri@svn.osgeo.org/postgis
Repository UUID: b70326c6-7e19-0410-871a-916f4a2858ee
Revision: 12718
Node Kind: directory
Schedule: normal
Last Changed Author: robe
Last Changed Rev: 12716
Last Changed Date: 2014-07-02 22:14:31 -0400 (Wed, 02 Jul 2014)

[] ~/work/postgis/trunk/extras/address_standardizer$ svn status
M       README.address_standardizer
[] ~/work/postgis/trunk/extras/address_standardizer$ svn commit -m "Test 
to make sure I can commit"
Authentication realm: <https://svn.osgeo.org:443> OSGeo Login
Password for 'woodbri':
svn: Commit failed (details follow):
svn: access to '/postgis/!svn/act/9cd18fa0-cf3f-49be-9129-30dac427717b' 

>> 1) Create folder in extensions of our repo and move the
>> address_standardizer
>> extension files to their
>> I'd still like it to be able to be built separately if people wish
>> (similar
>> to how we have liblwgeom I think) and my only reservation with
>> breaking out
>> like this is that it makes it less compact.
>    details aside, being able to build this seperately     BIG +1 !
> I have definitely used this since Denver 2011 (where I attended the PAGC
> talk and met various principals )
> The Address Standardizer by itself, without any TIGER geocoder, is quite
> valuable. I do appreciate the effort in this library and have said so to
> Steve in the past

I would defer to strk on this, but can we use a symlink to get the file 
to appear in two places under svn?

>> 2) Beef up the documentation -- right now all we have is how to
>> install it
>> in our install section of manual (and that of course needs to be
>> update with
>> new link now that its part of our repo)
>> http://postgis.net/docs/manual-dev/postgis_installation.html#installing_pagc
>> _address_standardizer So I'm going to add an additional .xml (separate
>> from tiger and install,
>> explaining all the nuances of the lexer / rule/ parser files)
>    well, this depends a lot on how the decomposition of libs turns out
> as referenced in the next sections below

I can help write documentation for this stuff and Walter has already 
written a bunch of docs for PAGC and we can pull the appropriate 
sections and reformat them for this.

>> 3) Before release, I'd like to put logic in the configure.ac so we do the
>> same checks and build if all dependencies are available and flag for pcre
>> library.  Right now to build I just add to my cppflags and shlib_link.
>>  This I imagine I'll need help with since the configure.ac script is
>> pretty
>> alien to me.
>    no matter what happens, PCRE and perl Regexp::Assemble are definitely
> required for this

Correct. To install Regexp::Assemble from CPAN you can use:

perl -MCPAN -e "install Regexp::Assemble"

if you can not find a system package for it. PCRE is pretty much a 
standard package on all recent systems and should be easy to find.

>> 4) Build separate extensions for the custom gaz/lex/rules currently
>> present
>> and add more. Right now to run the packaged dictionaries you need to
>> run the
>> lex,gaz,rules.sql files which is cumbersome from a newbie stand-point.
>> This one I'm actually thinking just rolling the current one in the base
>> extension and then having extensions for custom ones. Since at least US
>> people will just use the base one or if they are using tiger geocoder the
>> tiger geocoder one already packaged with tiger geocoder extension.
>    this is where things get muddy ... Like so many software projects, a
> broad generalized archtecture ends up covering a
> common use case, and the rest is then in the way or collects dust as
> focus narrows. It *is* great to have a generalized address parsing
> engine.. but how this lib got here is,
> its been difficult to modernize and put sufficient time into a small
> niche utility - Steve told me so..
> A "pragmatic" move would be to tightly configure the lex/gaz/etc to the
> TIGER Geocoder
> and ship it.. but, not using the capacity of the lib. On the other hand,
> if the generalized,
> multinational promise is pursued, who is going to build it out? Where
> are the OSM people ?
> I am interested sure but this is dense going.. Steve and Regina but are
> there enough hands ?
> no clear answers here...

OK, I have thoughts on this, along the following lines which have to do 
with the longer term. I think we should have multiple packages for the 
set of gaz/lex/rules, that can be used for different data sets. The 
reason for this is that files need to be customized for each data set. 
For example, the current Tiger Geocoder uses a different set of files 
than I use in my Tiger geocoder and my Navteq geocoder has its own set 
of files. My Canada geocoder has yet another set of files. I can see 
this evolving if we start adding datasets for UK, Australia, Europe, 
etc. down the road. Since this data is specific to the application and 
the data that that application is using, these custom datafiles should 
be part of those projects. So for now I would keep data you have for the 
Tiger Geocoder with that application and keep the generic files that 
came with the address standardizer as a more generic set that other 
people can use to make custom changes to.

>> 5) this one I'm still thinking about because it'll be a major breaking
>> change -- and that would be just to have current tiger geocoder require
>> address_standardizer and swap out the norm_addy object with the
>> address_standardize std_address one. But that requires a bit of rework
>> and
>> assurance that package maintainers can build address_standardizer without
>> too much fuss.
> The TIGER Geocoder treads the line between super-useful and
> super-spaghetti. Shoot the messanger if you like, but at least I have to
> umph to say it..
> I am constantly amazed at robe2's relentless productivity and the TIGER
> Geocoder project
> is a fruit of that, warts and all.. I use it and I have written about it..
> My personal take is -- change the TIGER Geocoder for 2.2+ and break
> compatability ..
> Whatever is convenient.. very much unlike the overall PostGIS project,
> there are
> few if any  'production systems' depending on the details, and damn them
> anyway if they whine

I don't have a strong opinion on this one, but my take would be to leave 
the current setup as is. We have talked about a total rewrite of the 
Tiger Geocoder to make it more generic and to follow more of the ideas 
that I have put into my geocoder. This would be the place to make 
breaking changes. My current geocoder uses 95+% of its code and I can 
load Tiger, Navteq, or Canada data into it. Longer term I would like to 
extend this to be able to load Navteq or TeleAtlas or other data for 
Western Europe, but we need to make some changes to address standardizer 
and parser to handle accents and parse input in non-English countries. 
This would give us a Geocoder capability that would be on par with 
Oracle Spatial's Geocoder.

Good ideas.


>> Thoughts?
>    I honestly try to contribute in small ways.. I hope this email is
> constuctive.
>> Thanks,
>> Regina
> --
> Brian M Hamlin
> OSGeo California Chapter
> blog.light42.com
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel

More information about the postgis-devel mailing list