[postgis-users] tiger geocoder address normalization issue (bug?)
James Marca
jmarca at translab.its.uci.edu
Thu Jan 28 21:09:01 PST 2016
On Thu, Jan 28, 2016 at 06:15:52PM -0500, Stephen Woodbridge wrote:
> Hi James,
>
> Thanks for reporting this. I'm hoping to have time this spring to work
> with Regina to do a major rewrite of the geocoder. The existing code is
> very Tiger centric and Tiger data has a lot of oddities in it some of
> which are built into the the existing code.
>
> The address standardizer is a new facility that was added last year and
> I have used it to build a few geocoders that are more data agnostic. I
> have just started working on a new address standardizer written in C++
> that I hope will be easier to understand and maintain and it already has
> support for UTF8 data so it will support European language addressing.
>
> For the curious or those that might want to help, you can checkout:
> https://github.com/woodbri/address-standardizer
> https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md
Cool. I will check that out. I won't help with coding though as my
C++ skills are rusty.
(and by the way I just noticed that I copy/pasted the wrong thing in
the report below...the first pair should both the us_ variants and are
in fact identical)
Regards,
James
>
> So what does this mean with regards to the report? Well things are
> changing and hopefully the bad stuff is going away to be replaced by a
> better more generic solution that will have a new set of issues, but
> hopefully a small set of them and ones that are easier to fix.
>
> -Steve
> http://imaptools.com
>
> On 1/28/2016 4:39 PM, James Marca wrote:
> >Hi,
> >
> >My goal is to use the address standarizer to match streets from two
> >different data sources.
> >
> >Test case is 8th Street in Alameda CA
> >
> >I was reading through the docs and found standardize_address
> >(http://postgis.net/docs/standardize_address.html), along with two
> >different ways to invoke it...using the us_[lex,gaz,rules] and the
> >tiger.pagc[lex,gaz,rules].
> >
> >I think I found an issue of sorts.
> >
> >First, using the 'us_' variant, all is well. If I query with Eighth
> >Street or 8th Street, I expect the street name to be the same so I can
> >match them, and they are:
> >
> >hpms_geocode=# SELECT name,suftype FROM standardize_address('us_lex',
> >'us_gaz', 'us_rules', '1 8th St, Alameda, CA');
> > name | suftype
> >------+---------
> > 8 | STREET
> >(1 row)
> >
> >hpms_geocode=# SELECT name,suftype FROM
> >standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules'
> >, '1 8th St, Alameda, CA');
> > name | suftype
> >------+---------
> > 8 | ST
> >(1 row)
> >
> >
> >However, if I do the same with the tiger.pagc variant, the results do
> >not match:
> >
> >hpms_geocode=# SELECT name,suftype FROM
> >standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules'
> >, '1 8th St, Alameda, CA');
> > name | suftype
> >------+---------
> > 8 | ST
> >(1 row)
> >
> >hpms_geocode=# SELECT name,suftype FROM
> >standardize_address('tiger.pagc_lex','tiger.pagc_gaz', 'tiger.pagc_rules'
> >, '1 Eighth St, Alameda, CA');
> > name | suftype
> >------+---------
> > 8TH | ST
> >(1 row)
> >
> >
> >Obviously I will just use the 'us_' version, but if the 'tiger.pagc_'
> >version is maintained, I think this is a bug and should be fixed
> >
> >Regards,
> >James
> >
> >
> >
> >
> >
> >_______________________________________________
> >postgis-users mailing list
> >postgis-users at lists.osgeo.org
> >http://lists.osgeo.org/mailman/listinfo/postgis-users
> >
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160128/32f13118/attachment.sig>
More information about the postgis-users
mailing list