[postgis-users] standardize_address: Rule does not work

Iris Rittinger (Terraplan) mailinglisten at terraplan.com
Wed Oct 28 23:42:07 PDT 2015


Hi Steve,

that sounds great.  If you need someone to test it with German Navteq 
Data or with data for each housenumber I could try to do that.

Thanks,
Iris

Am 28.10.2015 um 14:47 schrieb Stephen Woodbridge:
> Iris,
>
> I build a geocoder which will be used as the model for the next 
> generation geocoder in postgis. It is structured like this:
>
>
> Table or view of raw street segments with name, location, address 
> ranges and geometry, etc. I prefer to split records with both right 
> and left address and location information into multiple records one 
> for each side because it greatly simplifies and streamlines processing 
> later in the process.
>
> A standard address table that looks like the output of the 
> standard_address with a foreign key pointing to the raw data table
>
> We standardize the raw data into standard address table, add metaphone 
> columns for fuzzy search, etc.
>
> Now we geocode by running the request through standardize_address and 
> then match that against the standard address table and score the 
> results for the best match, then fetch the geometry for that record 
> and compute the location based on the address or address range.
>
> The key step in this process is to run the raw data through the 
> standardize_address() and to match against that. Think of 
> standardize_address as generating a key, we don't really care how the 
> key is generated, as long as the key generated from the reference data 
> matches the key generated by the incoming request. If we use data that 
> is already standardized by a vendor and then try to match and incoming 
> request against that, we run the risk that the vendor does not 
> standardize the same way that the code does and we get a mismatch.
>
> I have been able to get about 25-30 ms on average geocoding results on 
> a 80 million record reference set using US Tiger data or Navteq data 
> for US and Canada. And this can be improved upon if the data is 
> presorted by zip, state, city.
>
> Anyway, hope this give you some ideas.
>
> -Steve
>
> On 10/28/2015 7:34 AM, Iris Rittinger (Terraplan) wrote:
>> Hi Regina, hi Steve,
>>
>> thanks a lot for your answers.
>> When I try the parse_address function I do not get any useful result.
>> Seperating the macro and micro part in the standardize_address function
>> does work and the result looks pretty good.
>>
>> * The pagc code was not built around parsing UTF8 character strings.
>>   -- Yes, that is a problem.  It does not parse german letters
>>
>> * One needed enhancement to the the pagc parser is that it currently has
>> no ability separate inline street types from their street name
>>   -- Yes, that is in deed a problem for our use case
>>
>> Would be nice if others are willing to help out on this effort (either
>> via development or monetary support)
>> -- I will talk to our developers and my chef if there is a possiblity to
>> get involved
>>
>> and also our upcoming next generation geocoder effort which we plan to
>> use the address_standardizer with
>> -- sound great, since that is what I am really looking for.  We have a
>> reference table for geocoding and we would like to pass the incoming
>> address against that table.
>>
>> Thanks a lot,
>> Iris
>>
>>
>>
>> Am 27.10.2015 um 23:08 schrieb Paragon Corporation:
>>> Hi Iris,
>>>
>>> There are actually two standard_address functions:
>>> http://postgis.net/docs/manual-2.2/standardize_address.html
>>>
>>>
>>>
>>> 1) takes a single line (which is what you are using) and internally
>>> relies on parse_address which does have a lot of hard-coded
>>> assumptions about things (particularly US and Canada assumptions)
>>> stdaddr standardize_address(text lextab, text gaztab, text rultab,
>>> text address);
>>>
>>> To check if you are being bitten by internal hard-codings in
>>> parse_address, then try the parse_address function and see if what you
>>> get back makes sense:
>>>
>>> http://postgis.net/docs/manual-2.2/parse_address.html
>>>
>>>
>>>
>>> 2)  Is a two line standardizer that takes micro and macro
>>> (I apologize I neglected to provide an example for the second variant
>>> since I use the first mostly.  I'll add some documentation to that
>>> effect) and more detail about what is considered micro/ macro
>>>
>>> standardize_address(text lextab, text gaztab, text rultab, text micro,
>>> text macro);
>>>
>>> micro is like -  house_num street suffix
>>> macro is like - city, state, zip country
>>>
>>>
>>> Steve and I had discussed redoing the parse_address function to not
>>> use a hard-coded set of data, but that's another thing on our todo.
>>> Would be nice if others are willing to help out on this effort (either
>>> via development or monetary support) and also our upcoming next
>>> generation geocoder effort which we plan to use the
>>> address_standardizer with.
>>>
>>>
>>> Thanks,
>>> Regina
>>> http://www.postgis.us
>>> http://postgis.net
>>>
>>> -----Original Message-----
>>> From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On
>>> Behalf Of Stephen Woodbridge
>>> Sent: Tuesday, October 27, 2015 10:41 AM
>>> To: postgis-users at lists.osgeo.org
>>> Subject: Re: [postgis-users] standardize_address: Rule does not work
>>>
>>> Hi Iris,
>>>
>>> It has been able 3+ years since I did the initial integration of pagc
>>> into postgres and I'm afraid I have forgotten a lot in that time. But
>>> here are my thoughts.
>>>
>>> * I do not think the code explicitly hard codes any mapping like str
>>> => street so I would check that you lex and gaz tables are truly empty
>>> and that there is not some other table in the search_path that it is
>>> finding.  I think this is likely the issue.
>>>
>>> * The pagc code was not built around parsing UTF8 character strings.
>>> It might be ok, but it might not handle them well. I don't think
>>> anyone has done any testing of this so any feedback as you work on
>>> your project would be interesting and please open tickets if you find
>>> specific issues.
>>>
>>> * One needed enhancement to the the pagc parser is that it currently
>>> has no ability separate inline street types from their street name. IE:
>>> street types that are appended to the street name without a space.
>>>
>>> * I just pushed rules2txt and txt2rules to
>>> https://sourceforge.net/p/pagc/code/HEAD/tree/branches/sew-refactor/postgresql/ 
>>>
>>>
>>> You might find these scripts helpful because the convert rules to
>>> human readable tokens and back to numbers.
>>>
>>> Regina did the final integration of standardize_address into postgis
>>> and she might have some addition thoughts.
>>>
>>> -Steve
>>>
>>> On 10/27/2015 6:00 AM, Iris Rititnger (Terraplan) wrote:
>>>> Hi,
>>>>
>>>> I am trying to use the standardize_address function to standardize
>>>> german street names.  But I have a problem when the street is
>>>> recognized as some kind of special token and then followed by a space.
>>>>
>>>> When I try
>>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein
>>>> strasse 11')
>>>> I get the result I wish for: name -- HEIN STRASSE; house_num -- 11
>>>> When I try select * from
>>>> standardize_address('lex_d','gaz_d','rules_d','Hein str 11') I get an
>>>> empty row as result.
>>>> When I try
>>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein
>>>> str11') I get the result I wish for: name -- HEIN STR; house_num -- 11
>>>>
>>>> My lex and gaz tables are empty.  My rule table only has two rules
>>>> 1 1 0 -1 5 5 1 -1 1 1
>>>> 1 0 -1 5 1 -1 1 1
>>>> It seems that str is recognized as something different even though my
>>>> gaz and lex tables are empty.
>>>>
>>>> When I add the rule 1 2 0 -1 5 5 1 -1 1 1 and define
>>>> ;1;"STR";"street";2; in my lex table I still have the same problem
>>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein
>>>> str11') gives me a result of name -- HEIN street and house_num -- 1
>>>> select * from standardize_address('lex_d','gaz_d','rules_d','Hein str
>>>> 11') returns an empty row.
>>>>
>>>> To double check I installed pagc on my computer.  Here I get the
>>>> desired result when I define the rule 1 2 0 -1 5 5 1 -1 1 1, but this
>>>> does not seem to work in Postgis.
>>>>
>>>> Do you know why it is not working.  Can I somehow get postgres to show
>>>> me the input and output tokens it tries to use like pagc shows me?
>>>>
>>>> Thanks,
>>>> Iris
>>>>
>>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users

-- 
TERRAPLAN
Iris Rittinger
GIS-Projektmanagement
Gewerbestraße 33
79227 Schallstadt
Fon 07664 - 5053 13
Fax 07664 - 5053 21
www.terraplan.com



More information about the postgis-users mailing list