[postgis-users] tiger geocoder address normalization issue (bug?)

Stephen Woodbridge woodbri at swoodbridge.com
Thu Jan 28 15:15:52 PST 2016


Hi James,

Thanks for reporting this. I'm hoping to have time this spring to work 
with Regina to do a major rewrite of the geocoder. The existing code is 
very Tiger centric and Tiger data has a lot of oddities in it some of 
which are built into the the existing code.

The address standardizer is a new facility that was added last year and 
I have used it to build a few geocoders that are more data agnostic. I 
have just started working on a new address standardizer written in C++ 
that I hope will be easier to understand and maintain and it already has 
support for UTF8 data so it will support European language addressing.

For the curious or those that might want to help, you can checkout:
https://github.com/woodbri/address-standardizer
https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md

So what does this mean with regards to the report? Well things are 
changing and hopefully the bad stuff is going away to be replaced by a 
better more generic solution that will have a new set of issues, but 
hopefully a small set of them and ones that are easier to fix.

-Steve
  http://imaptools.com

On 1/28/2016 4:39 PM, James Marca wrote:
> Hi,
>
> My goal is to use the address standarizer to match streets from two
> different data sources.
>
> Test case is 8th Street in Alameda CA
>
> I was reading through the docs and found standardize_address
> (http://postgis.net/docs/standardize_address.html), along with two
> different ways to invoke it...using the us_[lex,gaz,rules] and the
> tiger.pagc[lex,gaz,rules].
>
> I think I found an issue of sorts.
>
> First, using the 'us_' variant, all is well.  If I query with Eighth
> Street or 8th Street, I expect the street name to be the same so I can
> match them, and they are:
>
> hpms_geocode=# SELECT name,suftype FROM standardize_address('us_lex', 'us_gaz', 'us_rules', '1 8th St, Alameda, CA');
>   name | suftype
> ------+---------
>   8    | STREET
> (1 row)
>
> hpms_geocode=# SELECT name,suftype FROM standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules' , '1 8th St, Alameda, CA');
>   name | suftype
> ------+---------
>   8    | ST
> (1 row)
>
>
> However, if I do the same with the tiger.pagc variant, the results do
> not match:
>
> hpms_geocode=# SELECT name,suftype FROM standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules' , '1 8th St, Alameda, CA');
>   name | suftype
> ------+---------
>   8    | ST
> (1 row)
>
> hpms_geocode=# SELECT name,suftype FROM standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules' , '1 Eighth St, Alameda, CA');
>   name | suftype
> ------+---------
>   8TH  | ST
> (1 row)
>
>
> Obviously I will just use the 'us_' version, but if the 'tiger.pagc_'
> version is maintained, I think this is a bug and should be fixed
>
> Regards,
> James
>
>
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the postgis-users mailing list