[postgis-users] Tiger geocoder, Spanish street types are pushed to the end

James Marca jmarca at translab.its.uci.edu
Wed Sep 26 23:18:59 PDT 2012


Hi I think I am bumping up against a problem with the street
normalization routines in the Tiger geocoder, but I'm not sure.

version is from git, last commit Sat Aug 11 19:58:33 2012

In California, we have a lot of Spanish street names.  For example
there are names like Via Canon (that second n used to have a tilde I
think), Via Verde, and Camino Las Ramblas.

The geocode function wants to flip these Spanish names to the end, apparently.
For example:

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM  geocode( 'Via Verde, Dana Point CA');
                   pprint_addy                   |                 st_astext                 | rating 
-------------------------------------------------+-------------------------------------------+--------
 Via Verde Ct, Calabasas, CA 91302               | POINT(-118.659995686466 34.1275841694006) |     41
 Via Verde St, Covina, CA 91724                  | POINT(-117.858043689933 34.0697638249141) |     41
...

But if you flip the name around, you get

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM  geocode( 'Verde Via, Dana Point CA');
            pprint_addy            |                 st_astext                 | rating 
-----------------------------------+-------------------------------------------+--------
 Verde Via, Dana Point, CA 92624   | POINT(-117.672816628784 33.4623777015046) |     38
 Verde Vw, National City, CA 91950 | POINT(-117.060049181038 32.6573081053596) |     40
...


Similarly Camino Las Ramblas is a pretty major street, but you get:

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM  geocode( 'Camino Las Ramblas, San Juan Capistrano CA');
            pprint_addy             |                 st_astext                 | rating 
------------------------------------+-------------------------------------------+--------
 Lago Cll, Dana Point, CA 92624     | POINT(-117.665451952078 33.4629500870658) |     60
 Lago Cll, San Clemente, CA 92672   | POINT(-117.629773684804 33.4331286117088) |     68
...

But if you flip Camino to the end, you get:

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM  geocode( 'Las Ramblas Camino, San Juan Capistrano CA');
                  pprint_addy                   |                 st_astext                 | rating 
------------------------------------------------+-------------------------------------------+--------
 Cam Las Ramblas, San Juan Capistrano, CA 92675 | POINT(-117.662978341711 33.4686616216608) |     38
 Las Ramblas Dr, Concord, CA 94521              | POINT(-121.9494654272 37.9565518802366)   |     53

(I just figured out that I am clouding the issue somewhat by using
pprint_addy here, but still, the addy object has stripped the Cam part
away from the 'Las Ramblas' part.)

I took a scan of addrfeat, and see:

geocoder=# select distinct fullname from addrfeat where fullname ~* 'Las Ramblas' limit 10;
    fullname     
-----------------
 Cam Las Ramblas
 Via Las Ramblas
 Cll Las Ramblas
 Ave Las Ramblas
 Las Ramblas Dr
 Las Ramblas

But again, if you try to geocode '28005 Cam Las Ramblas, San Juan Capistrano
CA', the geocoder can't find it, but '28005 Las Ramblas Cam' has no troubles.  

Is this a bug, or a failing heuristic?  Is there a way to turn that
off for spanish names?  Or perhaps better, is there a way to call
whatever function is monkeying with the addrfeat.fullname strings to
get the same effect on my input strings?  That would mean apples
compared to apples, which would give the best shot at matching.

Thanks for any pointers.

Regards,
James Marca
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20120926/207f28ee/attachment.pgp>


More information about the postgis-users mailing list