[postgis-users] Tiger geocoder, Spanish street types are pushed to the end

Paragon Corporation lr at pcorp.us
Thu Sep 27 12:56:53 PDT 2012


James,

Sadly this is a known issue and one we haven't settled on the best way to
fix without resorting to major surgery.

Here is the ticket for it:

http://trac.osgeo.org/postgis/ticket/1118

If you can add your examples to the ticket and add yourself to cc for the
ticket that would be great.  We'll add these to our regress tests when we
finally come up with a palpable fix.


In MA we have similar issues, but mostly with french streets locally they
are a rare occurrence, but Leo feels your pain about Cali since he's a San
Diego boy.

Sorry we couldn't be more helpful.

Regina and Leo
http://www.postgis.us 

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of James
Marca
Sent: Thursday, September 27, 2012 2:19 AM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Tiger geocoder,Spanish street types are pushed to
the end

Hi I think I am bumping up against a problem with the street normalization
routines in the Tiger geocoder, but I'm not sure.

version is from git, last commit Sat Aug 11 19:58:33 2012

In California, we have a lot of Spanish street names.  For example there are
names like Via Canon (that second n used to have a tilde I think), Via
Verde, and Camino Las Ramblas.

The geocode function wants to flip these Spanish names to the end,
apparently.
For example:

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM
geocode( 'Via Verde, Dana Point CA');
                   pprint_addy                   |                 st_astext
| rating 
-------------------------------------------------+--------------------------
-----------------+--------
 Via Verde Ct, Calabasas, CA 91302               | POINT(-118.659995686466
34.1275841694006) |     41
 Via Verde St, Covina, CA 91724                  | POINT(-117.858043689933
34.0697638249141) |     41
...

But if you flip the name around, you get

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM
geocode( 'Verde Via, Dana Point CA');
            pprint_addy            |                 st_astext
| rating 
-----------------------------------+----------------------------------------
---+--------
 Verde Via, Dana Point, CA 92624   | POINT(-117.672816628784
33.4623777015046) |     38
 Verde Vw, National City, CA 91950 | POINT(-117.060049181038
32.6573081053596) |     40
...


Similarly Camino Las Ramblas is a pretty major street, but you get:

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM
geocode( 'Camino Las Ramblas, San Juan Capistrano CA');
            pprint_addy             |                 st_astext
| rating 
------------------------------------+---------------------------------------
----+--------
 Lago Cll, Dana Point, CA 92624     | POINT(-117.665451952078
33.4629500870658) |     60
 Lago Cll, San Clemente, CA 92672   | POINT(-117.629773684804
33.4331286117088) |     68
...

But if you flip Camino to the end, you get:

geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM
geocode( 'Las Ramblas Camino, San Juan Capistrano CA');
                  pprint_addy                   |                 st_astext
| rating 
------------------------------------------------+---------------------------
----------------+--------
 Cam Las Ramblas, San Juan Capistrano, CA 92675 | POINT(-117.662978341711
33.4686616216608) |     38
 Las Ramblas Dr, Concord, CA 94521              | POINT(-121.9494654272
37.9565518802366)   |     53

(I just figured out that I am clouding the issue somewhat by using
pprint_addy here, but still, the addy object has stripped the Cam part away
from the 'Las Ramblas' part.)

I took a scan of addrfeat, and see:

geocoder=# select distinct fullname from addrfeat where fullname ~* 'Las
Ramblas' limit 10;
    fullname     
-----------------
 Cam Las Ramblas
 Via Las Ramblas
 Cll Las Ramblas
 Ave Las Ramblas
 Las Ramblas Dr
 Las Ramblas

But again, if you try to geocode '28005 Cam Las Ramblas, San Juan Capistrano
CA', the geocoder can't find it, but '28005 Las Ramblas Cam' has no
troubles.  

Is this a bug, or a failing heuristic?  Is there a way to turn that off for
spanish names?  Or perhaps better, is there a way to call whatever function
is monkeying with the addrfeat.fullname strings to get the same effect on my
input strings?  That would mean apples compared to apples, which would give
the best shot at matching.

Thanks for any pointers.

Regards,
James Marca





More information about the postgis-users mailing list