[Geodata] [Tiger] A few interesting observations on the Tiger2007fedata

John P. Linderman jpl at research.att.com
Wed Aug 6 13:59:05 EDT 2008


A bit of background:  In addition to using the Tiger data for
geocoding, I was hoping to combine the data I get (but cannot
redistribute) from the US Postal Service (USPS) with the Tiger
data and render both better.  For example, the USPS only does
general delivery in some tiny towns, like Diamondville, WY,
so they have little or no street-level information.  Tiger,
however, is targeted for census, not letter delivery, so I
could extend the USPS data with the Tiger data, and get
matches on addresses that would otherwise be unrecognizable.

Conversely, I can use the address range information from the
USPS to do sanity checks on Tiger address ranges that seemed
preposterously large, or maybe even fill in some of the
missing ZIP+4 addon fields.

Unfortunately, the USPS and Tiger don't always agree on names.
This is particularly a problem on place names, where the Tiger
names, as found in the PLACEFP field from the Topological
Faces Relationship file, pushed through the state-level Current Place
shapefile, are often quite different from the USPS place names.
For example, what Tiger refers to as "Peapack and Gladstone"
in New Jersey corresponds to two separate USPS cities,
PEAPACK and GLADSTONE, each with their own ZIP code.
The USPS does not acknowledge the existence of
PEAPACK AND GLADSTONE, and Tiger does not acknowledge
the existence of Peapack or of Gladstone as separate
entities.

If place name is not going to agree in many cases,
I hoped (and you know where this is going to lead)
that I could use ZIP code to reconcile the two sources.
In particular, in the common case where a ZIP code
uniquely identifies a single USPS city, I could stir
that city back into the Tiger data, so the place names
could be made to agree.  To this end, I took all the
unique combinations of state, place name and ZIP from
the Tiger data, and added USPS city key and city name
when the ZIP uniquely identified one.

I got quite a few cases where the ZIP also identified
a unique Tiger place name, agreeing exactly with the
USPS name, as in

07843|NJ|Hopatcong|V10658+NJHOPATCONG

and a few more where the Tiger place name was unique,
but not identical to the USPS name, as in

07977|NJ|Peapack and Gladstone|V11167+NJPEAPACK

But there were distressingly many cases where the ZIP,
known to uniquely identify a USPS city name, turned up
in many different Tiger place names, like

07002|NJ|Bayonne|V10082+NJBAYONNE
07002|NJ|Jersey City|V10082+NJBAYONNE

This is not like the Peapack and Gladstone case, with Jersey City
being an alias for Bayonne.  Jersey City is a distinct, although
adjacent, city, with ZIPs of its own, none of which is 07002,
of course, because it is unique to Bayonne.  So I went looking
for places where Jersey City appeared with ZIP 07002, and I
found the following.

NJ|Jersey City|Ave C|1173;1179;07002;L|59603093|40.688284,-74.100948 
40.688560,-
74.100718
NJ|Jersey City|Ave C|1181;1185;07305;L+1180;1186;07305;R|59603100|40.688560,-74
.
100718 40.689289,-74.100110
NJ|Jersey City|Ave C|1188;1194;07305;R+1187;1195;07305;L|59603102|40.689289,-74
.
100110 40.689638,-74.099817
NJ|Jersey City|Ave C|1196;1220;07305;R+1197;1221;07305;L|59603096|40.689638,-74
.
099817 40.689915,-74.099587

According to the USPS, there is no street whose name begins with AVE
in Jersey City.  The USPS shows an AVENUE C in Bayonne, with zip 07002
for all addresses.  The addresses are in the range 1-1199, so the
probability is high that they are the same addresses that Tiger has.

The other edges listed, which link together by lat/long, are
even more troublesome, because they have a valid Jersey City ZIP,
07305, as well as the Jersey City place name.  It would be hard
to determine, in isolation, a possible correlation between these
addresses and the USPS addresses.  It would be unfortunate if
many perfectly valid postal addresses couldn't be geocoded.
I don't know how common the problem is, but, given the number
of cases where the (unique) city that the USPS associates with
a ZIP is turning up in multiple Tiger places, I fear it may
involve a fairly large number of address ranges.

There are a couple rays of hope.  If you get the ZIP Code
Tabulation Area shapefiles from
  http://www.census.gov/geo/www/cob/z52000.html#shp
and look up the lat/long, it assigns some of them to
ZIP 07002, a Bayonne ZIP.  So we might be able to establish
a tie-in in that way.  And, as in other postings to this
thread, linking adjacent edges together can provide clues.
At the head of the list above, we can add a edge 

NJ|Bayonne,Jersey City|Ave C||59603090|40.688075,-74.101122 
40.688284,-74.100948

yielding a chain that starts

NJ|Bayonne,Jersey City|Ave C||59603090|40.688075,-74.101122 
40.688284,-74.100948
NJ|Jersey City|Ave C|1173;1179;07002;L|59603093|40.688284,-74.100948 
40.688560,-
74.100718
NJ|Jersey City|Ave C|1181;1185;07305;L+1180;1186;07305;R|59603100|40.688560,-74
.

The first link makes it clear that we are near the boundary of Jersey
City and Bayonne, and the second has a Bayonne ZIP with the Jersey City
place name, reinforcing the transitional nature.  If we get no USPS
match for Jersey City/07305 (and we won't), it's reasonable to try
Bayonne/07002, where a match will be found.  We could either correct
the place and ZIP in the Tiger data, or at least fake an alternate
place name and ZIP in addition to what is already there.  (We'd also
want to add an alternate feature name, since the USPS calls the
street AVENUE C, not AVE C.)

So, as always, I keep coming back to linking edges of the same
street together, but "the same street" no longer need agree on
ZIP or place name, and, at least for making inferences about
dubious ZIPs and place names, and maybe only on part of the feature
name, so SW Ave C could link up with NE Ave C or just Ave C.  -- jpl




More information about the Geodata mailing list