[postgis-users] standardize_address: Rule does not work

Stephen Woodbridge woodbri at swoodbridge.com
Wed Oct 28 06:47:05 PDT 2015


Iris,

I build a geocoder which will be used as the model for the next 
generation geocoder in postgis. It is structured like this:


Table or view of raw street segments with name, location, address ranges 
and geometry, etc. I prefer to split records with both right and left 
address and location information into multiple records one for each side 
because it greatly simplifies and streamlines processing later in the 
process.

A standard address table that looks like the output of the 
standard_address with a foreign key pointing to the raw data table

We standardize the raw data into standard address table, add metaphone 
columns for fuzzy search, etc.

Now we geocode by running the request through standardize_address and 
then match that against the standard address table and score the results 
for the best match, then fetch the geometry for that record and compute 
the location based on the address or address range.

The key step in this process is to run the raw data through the 
standardize_address() and to match against that. Think of 
standardize_address as generating a key, we don't really care how the 
key is generated, as long as the key generated from the reference data 
matches the key generated by the incoming request. If we use data that 
is already standardized by a vendor and then try to match and incoming 
request against that, we run the risk that the vendor does not 
standardize the same way that the code does and we get a mismatch.

I have been able to get about 25-30 ms on average geocoding results on a 
80 million record reference set using US Tiger data or Navteq data for 
US and Canada. And this can be improved upon if the data is presorted by 
zip, state, city.

Anyway, hope this give you some ideas.

-Steve

On 10/28/2015 7:34 AM, Iris Rittinger (Terraplan) wrote:
> Hi Regina, hi Steve,
>
> thanks a lot for your answers.
> When I try the parse_address function I do not get any useful result.
> Seperating the macro and micro part in the standardize_address function
> does work and the result looks pretty good.
>
> * The pagc code was not built around parsing UTF8 character strings.
>   -- Yes, that is a problem.  It does not parse german letters
>
> * One needed enhancement to the the pagc parser is that it currently has
> no ability separate inline street types from their street name
>   -- Yes, that is in deed a problem for our use case
>
> Would be nice if others are willing to help out on this effort (either
> via development or monetary support)
> -- I will talk to our developers and my chef if there is a possiblity to
> get involved
>
> and also our upcoming next generation geocoder effort which we plan to
> use the address_standardizer with
> -- sound great, since that is what I am really looking for.  We have a
> reference table for geocoding and we would like to pass the incoming
> address against that table.
>
> Thanks a lot,
> Iris
>
>
>
> Am 27.10.2015 um 23:08 schrieb Paragon Corporation:
>> Hi Iris,
>>
>> There are actually two standard_address functions:
>> http://postgis.net/docs/manual-2.2/standardize_address.html
>>
>>
>>
>> 1) takes a single line (which is what you are using) and internally
>> relies on parse_address which does have a lot of hard-coded
>> assumptions about things (particularly US and Canada assumptions)
>> stdaddr standardize_address(text lextab, text gaztab, text rultab,
>> text address);
>>
>> To check if you are being bitten by internal hard-codings in
>> parse_address, then try the parse_address function and see if what you
>> get back makes sense:
>>
>> http://postgis.net/docs/manual-2.2/parse_address.html
>>
>>
>>
>> 2)  Is a two line standardizer that takes micro and macro
>> (I apologize I neglected to provide an example for the second variant
>> since I use the first mostly.  I'll add some documentation to that
>> effect) and more detail about what is considered micro/ macro
>>
>> standardize_address(text lextab, text gaztab, text rultab, text micro,
>> text macro);
>>
>> micro is like -  house_num street suffix
>> macro is like - city, state, zip country
>>
>>
>> Steve and I had discussed redoing the parse_address function to not
>> use a hard-coded set of data, but that's another thing on our todo.
>> Would be nice if others are willing to help out on this effort (either
>> via development or monetary support) and also our upcoming next
>> generation geocoder effort which we plan to use the
>> address_standardizer with.
>>
>>
>> Thanks,
>> Regina
>> http://www.postgis.us
>> http://postgis.net
>>
>> -----Original Message-----
>> From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On
>> Behalf Of Stephen Woodbridge
>> Sent: Tuesday, October 27, 2015 10:41 AM
>> To: postgis-users at lists.osgeo.org
>> Subject: Re: [postgis-users] standardize_address: Rule does not work
>>
>> Hi Iris,
>>
>> It has been able 3+ years since I did the initial integration of pagc
>> into postgres and I'm afraid I have forgotten a lot in that time. But
>> here are my thoughts.
>>
>> * I do not think the code explicitly hard codes any mapping like str
>> => street so I would check that you lex and gaz tables are truly empty
>> and that there is not some other table in the search_path that it is
>> finding.  I think this is likely the issue.
>>
>> * The pagc code was not built around parsing UTF8 character strings.
>> It might be ok, but it might not handle them well. I don't think
>> anyone has done any testing of this so any feedback as you work on
>> your project would be interesting and please open tickets if you find
>> specific issues.
>>
>> * One needed enhancement to the the pagc parser is that it currently
>> has no ability separate inline street types from their street name. IE:
>> street types that are appended to the street name without a space.
>>
>> * I just pushed rules2txt and txt2rules to
>> https://sourceforge.net/p/pagc/code/HEAD/tree/branches/sew-refactor/postgresql/
>>
>> You might find these scripts helpful because the convert rules to
>> human readable tokens and back to numbers.
>>
>> Regina did the final integration of standardize_address into postgis
>> and she might have some addition thoughts.
>>
>> -Steve
>>
>> On 10/27/2015 6:00 AM, Iris Rititnger (Terraplan) wrote:
>>> Hi,
>>>
>>> I am trying to use the standardize_address function to standardize
>>> german street names.  But I have a problem when the street is
>>> recognized as some kind of special token and then followed by a space.
>>>
>>> When I try
>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein
>>> strasse 11')
>>> I get the result I wish for: name -- HEIN STRASSE; house_num -- 11
>>> When I try select * from
>>> standardize_address('lex_d','gaz_d','rules_d','Hein str 11') I get an
>>> empty row as result.
>>> When I try
>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein
>>> str11') I get the result I wish for: name -- HEIN STR; house_num -- 11
>>>
>>> My lex and gaz tables are empty.  My rule table only has two rules
>>> 1 1 0 -1 5 5 1 -1 1 1
>>> 1 0 -1 5 1 -1 1 1
>>> It seems that str is recognized as something different even though my
>>> gaz and lex tables are empty.
>>>
>>> When I add the rule 1 2 0 -1 5 5 1 -1 1 1 and define
>>> ;1;"STR";"street";2; in my lex table I still have the same problem
>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein
>>> str11') gives me a result of name -- HEIN street and house_num -- 1
>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein str
>>> 11') returns an empty row.
>>>
>>> To double check I installed pagc on my computer.  Here I get the
>>> desired result when I define the rule 1 2 0 -1 5 5 1 -1 1 1, but this
>>> does not seem to work in Postgis.
>>>
>>> Do you know why it is not working.  Can I somehow get postgres to show
>>> me the input and output tokens it tries to use like pagc shows me?
>>>
>>> Thanks,
>>> Iris
>>>
>>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list