[postgis-users] [slightly off-topic] Question on build a C address parse for an embedded geocoder
Stephen Woodbridge
woodbri at swoodbridge.com
Sat Oct 20 21:12:28 PDT 2012
Hi Dev's,
I am interested in writing an address standardized in C that could be
callable from SQL. I understand the basics of doing this from supporting
pgRouting and writing some additional commands. It would get used
something like:
select * from standardize(address, city, state, country, postcode);
select * from standardize(address_one_line);
and would return a standardized set of fields like: house_num, street,
city, state, country, postcode. These could be then used to create
standardized reference table or it could be passed into the geocoder
that would search the standardized reference table.
What I am struggling with is how to best initial the address
parser/standardize. The concept I have in mind is to have some tables
that represent the lexicon, gazetteer, parsing rules, etc. This data
could be specific to country and/or country-state. I could be fairly
small or quite large. For example, there are about 40K unique city names
based on the USPS zipcodes and about 7K of them have duplicate
standardizations based on state.
On the one hand I can read these tables on every request and build the
internal structures, parse the request, and throw out the internal
structures.
Basically once the reference source records have been standardized you
should not be changing the above tables because you want to standardize
future search requests based on the same rules that the reference road
segments were standardized.
And ideally you do not want to spend the time to rebuild these internal
structures on every search request.
So is there a mechanism for building some internal data and holding on
to it between requests. I suppose I could store it in a blob, but it
would then need to be de-toasted on every search request.
Maybe, I'm this is an non-issue, but it seems to impact the design
depending on what options I might have and how they are implemented and
accessed from the code.
Thoughts?
Thanks for any help or suggestions,
-Steve W
More information about the postgis-users
mailing list