[postgis-devel] PAGC Address Standardizer some thoughts on how toorganize

Paragon Corporation lr at pcorp.us
Sat Jul 5 08:09:03 PDT 2014

> I don't like the idea of including them with as part of the address
standardizer extension for the following reasons:

> 1. calling them sample_lex, sample_gaz, sample_rules just loads junk that
can not easily be removed because it is bundled with the extension

> 2. we might have multiple lex, gaz, rules for different countries/data
sets like, Tiger, Canada, UK, France, Germany, etc or for other use cases.

> Would it be ok if we created an address_standardizer_sample_data extension
that loaded sample_lex, sample_gaz, sample_rules? this would make sense for
documentation and testing. And we could extend that to create additional 
> packages in the future if we decide to do that.

Yah that's fine.  That was originally what I was thinking but thought you
were against that.  I just want them wrapped as an extension so users don't
have loose files to contend with that they have to hunt and find.

> I would bundle the Tiger Geocoder files with that extension.
You mean the rewrite right? In that case wouldn't we bundle the files with
the new geocoder extension and not the sample_data extension?
In my mind the sample data extension would be just that -- this is what the
structure looks like of the tables and how you populate them
And the documentation would reference them, describe what each part means
with examples using them, and show how to copy them to use as a template for
your own.

> The rationale here is that files that are part of an application should
get loaded with that application, but if I'm loading just the address
standardizer then I should choose which data files I want to load if any
because I am likely building my own application and may be loading my own
versions of the data files.


> I originally looked the the Tiger Geocoder but it is too tied to legacy
Tiger structure. After dealing with the data loading to a more abstract
table structure that could be used with address standardizer it only took a
week to implement the core geocoding engine. So a clean slate rewrite could
greatly simplify the code and make it much easier to support and enhance. 
> I'll try to start a white paper that discuss how I did this and that might
be a good starting point for discussing the design and rewrite for a future



More information about the postgis-devel mailing list