[postgis-users] tiger geocoder address normalization issue (bug?)

James Marca jmarca at translab.its.uci.edu
Thu Jan 28 21:09:01 PST 2016


On Thu, Jan 28, 2016 at 06:15:52PM -0500, Stephen Woodbridge wrote:
> Hi James,
> 
> Thanks for reporting this. I'm hoping to have time this spring to work 
> with Regina to do a major rewrite of the geocoder. The existing code is 
> very Tiger centric and Tiger data has a lot of oddities in it some of 
> which are built into the the existing code.
> 
> The address standardizer is a new facility that was added last year and 
> I have used it to build a few geocoders that are more data agnostic. I 
> have just started working on a new address standardizer written in C++ 
> that I hope will be easier to understand and maintain and it already has 
> support for UTF8 data so it will support European language addressing.
> 
> For the curious or those that might want to help, you can checkout:
> https://github.com/woodbri/address-standardizer
> https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md

Cool.  I will check that out.  I won't help with coding though as my
C++ skills are rusty.

(and by the way I just noticed that I copy/pasted the wrong thing in
the report below...the first pair should both the us_ variants and are
in fact identical)

Regards,
James 
> 
> So what does this mean with regards to the report? Well things are 
> changing and hopefully the bad stuff is going away to be replaced by a 
> better more generic solution that will have a new set of issues, but 
> hopefully a small set of them and ones that are easier to fix.
> 
> -Steve
>  http://imaptools.com
> 
> On 1/28/2016 4:39 PM, James Marca wrote:
> >Hi,
> >
> >My goal is to use the address standarizer to match streets from two
> >different data sources.
> >
> >Test case is 8th Street in Alameda CA
> >
> >I was reading through the docs and found standardize_address
> >(http://postgis.net/docs/standardize_address.html), along with two
> >different ways to invoke it...using the us_[lex,gaz,rules] and the
> >tiger.pagc[lex,gaz,rules].
> >
> >I think I found an issue of sorts.
> >
> >First, using the 'us_' variant, all is well.  If I query with Eighth
> >Street or 8th Street, I expect the street name to be the same so I can
> >match them, and they are:
> >
> >hpms_geocode=# SELECT name,suftype FROM standardize_address('us_lex', 
> >'us_gaz', 'us_rules', '1 8th St, Alameda, CA');
> >  name | suftype
> >------+---------
> >  8    | STREET
> >(1 row)
> >
> >hpms_geocode=# SELECT name,suftype FROM 
> >standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules' 
> >, '1 8th St, Alameda, CA');
> >  name | suftype
> >------+---------
> >  8    | ST
> >(1 row)
> >
> >
> >However, if I do the same with the tiger.pagc variant, the results do
> >not match:
> >
> >hpms_geocode=# SELECT name,suftype FROM 
> >standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules' 
> >, '1 8th St, Alameda, CA');
> >  name | suftype
> >------+---------
> >  8    | ST
> >(1 row)
> >
> >hpms_geocode=# SELECT name,suftype FROM 
> >standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules' 
> >, '1 Eighth St, Alameda, CA');
> >  name | suftype
> >------+---------
> >  8TH  | ST
> >(1 row)
> >
> >
> >Obviously I will just use the 'us_' version, but if the 'tiger.pagc_'
> >version is maintained, I think this is a bug and should be fixed
> >
> >Regards,
> >James
> >
> >
> >
> >
> >
> >_______________________________________________
> >postgis-users mailing list
> >postgis-users at lists.osgeo.org
> >http://lists.osgeo.org/mailman/listinfo/postgis-users
> >
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160128/32f13118/attachment.sig>


More information about the postgis-users mailing list