[Geodata] [Tiger] A few interesting observations on the Tiger2007fedata

John P. Linderman jpl at research.att.com
Thu Aug 7 09:54:46 EDT 2008


On Wed, 06 Aug 2008, Stephen Woodbridge <woodbri at swoodbridge.com> said
(with some editing on my part):

> It is also important to note here that the zipcode info in Census is by 
> and large circa 1990 when the Census last used the USPS Zip+4 database 
> and did a mass merge/update of Tiger. This has an impact in a couple of 
> ways:
> 
> 1) zipcodes come and go based on the needs of the USPS so numbers can 
> get dropped and new ones added and over time the dropped ones can get 
> reassigned to new city's. It has been 18 years since 1990!
>
> 2) as you have seen, USPS city names do not map directly to government 
> legal names for places that the Census is supposed to use based on the 
> old FIPS codes which are now obsolete and being replaced.

In the case I described in
  http://lists.osgeo.org/pipermail/geodata/2008-August/000751.html
though, the ZIPs were still current, and the placenames were
sensible.  The problem was that addresses were assigned to the
wrong city and ZIP.  It's hard to know how widespread that sort
of problem is.  Since I save *all* unique combinations of state,
placename and zip, one bad address range has the same effect as
hundreds of repeated good ones.

> Are you using the USPS TIGER-ZIP+4 dataset? or the USPS Zip+4 database 
> which is more current but not correlated with Tiger like the first?

I'm using the USPS Zip+4 product to find addresses.

> For geocoding, whenever I ran into a name like "Peapack and Gladstone", 
> I would link that TLID to multiple city names like:
> 
> <tlid> => {
>    "Peapack and Gladstone",
>    "Peapack",
>    "Gladstone",
>    "<usps name if different from above>",
> }
> 
> This way, I was pretty much assured that I would get to the TLID 
> regardless of the city that the user entered. Also on output, I would 
> standardize the city name to the USPS name if it was available, because 
> this is what would most likely work if you tried the address on Google.

That certainly makes sense.  The USPS city/state product already has
aliases for city names, so MANHATTAN is recognized as an alias for
NEW YORK in New York, for example.  I have already extended that
using data from an old US Geological Survey file to pick up
placenames like MEYERSVILLE New Jersey, which gets mail from the
GILLETTE post office, but isn't recognized by the USPS as an
alias for GILLETTE.  And I plan to stir in any new Tiger place
names, if I can get past the noisy data (because I don't want to
make JERSEY CITY an alias for BAYONNE, just because a Bayonne
ZIP code turned up on one Jersey City TLID).  I'm hopeful
(though I don't know why, since my hopes keep getting pummeled)
that if I put some lower limit on the number of place/zip
occurrences needed to establish a relationship, I can pick up
the valid aliases and ignore the dirt.

A comprehensive (and clean) file of aliases would be a great
asset, but, since the city/state product is not in the public
domain, I don't think we could post it.  -- jpl



More information about the Geodata mailing list