[Geodata] [Tiger] A few interesting observations on the Tiger2007fedata

Stephen Woodbridge woodbri at swoodbridge.com
Wed Aug 6 16:52:13 EDT 2008


John,

Thank you for another very interesting analysis.

It is also important to note here that the zipcode info in Census is by 
and large circa 1990 when the Census last used the USPS Zip+4 database 
and did a mass merge/update of Tiger. This has an impact in a couple of 
ways:

1) zipcodes come and go based on the needs of the USPS so numbers can 
get dropped and new ones added and over time the dropped ones can get 
reassigned to new city's. It has been 18 years since 1990!

2) as you have seen, USPS city names do not map directly to government 
legal names for places that the Census is supposed to use based on the 
old FIPS codes which are now obsolete and being replaced.

Are you using the USPS TIGER-ZIP+4 dataset? or the USPS Zip+4 database 
which is more current but not correlated with Tiger like the first?

For geocoding, whenever I ran into a name like "Peapack and Gladstone", 
I would like that TLID to multiple city names like:

<tlid> => {
   "Peapack and Gladstone",
   "Peapack",
   "Gladstone",
   "<usps name if different from above>",
}

This way, I was pretty much assured that I would get to the TLID 
regardless of the city that the user entered. Also on output, I would 
standardize the city name to the USPS name if it was available, because 
this is what would most likely work if you tried the address on Google.

-Steve W

John P. Linderman wrote:
> A bit of background:  In addition to using the Tiger data for
> geocoding, I was hoping to combine the data I get (but cannot
> redistribute) from the US Postal Service (USPS) with the Tiger
> data and render both better.  For example, the USPS only does
> general delivery in some tiny towns, like Diamondville, WY,
> so they have little or no street-level information.  Tiger,
> however, is targeted for census, not letter delivery, so I
> could extend the USPS data with the Tiger data, and get
> matches on addresses that would otherwise be unrecognizable.
> 
> Conversely, I can use the address range information from the
> USPS to do sanity checks on Tiger address ranges that seemed
> preposterously large, or maybe even fill in some of the
> missing ZIP+4 addon fields.
> 
> Unfortunately, the USPS and Tiger don't always agree on names.
> This is particularly a problem on place names, where the Tiger
> names, as found in the PLACEFP field from the Topological
> Faces Relationship file, pushed through the state-level Current Place
> shapefile, are often quite different from the USPS place names.
> For example, what Tiger refers to as "Peapack and Gladstone"
> in New Jersey corresponds to two separate USPS cities,
> PEAPACK and GLADSTONE, each with their own ZIP code.
> The USPS does not acknowledge the existence of
> PEAPACK AND GLADSTONE, and Tiger does not acknowledge
> the existence of Peapack or of Gladstone as separate
> entities.
> 
> If place name is not going to agree in many cases,
> I hoped (and you know where this is going to lead)
> that I could use ZIP code to reconcile the two sources.
> In particular, in the common case where a ZIP code
> uniquely identifies a single USPS city, I could stir
> that city back into the Tiger data, so the place names
> could be made to agree.  To this end, I took all the
> unique combinations of state, place name and ZIP from
> the Tiger data, and added USPS city key and city name
> when the ZIP uniquely identified one.
> 
> I got quite a few cases where the ZIP also identified
> a unique Tiger place name, agreeing exactly with the
> USPS name, as in
> 
> 07843|NJ|Hopatcong|V10658+NJHOPATCONG
> 
> and a few more where the Tiger place name was unique,
> but not identical to the USPS name, as in
> 
> 07977|NJ|Peapack and Gladstone|V11167+NJPEAPACK
> 
> But there were distressingly many cases where the ZIP,
> known to uniquely identify a USPS city name, turned up
> in many different Tiger place names, like
> 
> 07002|NJ|Bayonne|V10082+NJBAYONNE
> 07002|NJ|Jersey City|V10082+NJBAYONNE
> 
> This is not like the Peapack and Gladstone case, with Jersey City
> being an alias for Bayonne.  Jersey City is a distinct, although
> adjacent, city, with ZIPs of its own, none of which is 07002,
> of course, because it is unique to Bayonne.  So I went looking
> for places where Jersey City appeared with ZIP 07002, and I
> found the following.
> 
> NJ|Jersey City|Ave C|1173;1179;07002;L|59603093|40.688284,-74.100948 
> 40.688560,-
> 74.100718
> NJ|Jersey City|Ave C|1181;1185;07305;L+1180;1186;07305;R|59603100|40.688560,-74
> .
> 100718 40.689289,-74.100110
> NJ|Jersey City|Ave C|1188;1194;07305;R+1187;1195;07305;L|59603102|40.689289,-74
> .
> 100110 40.689638,-74.099817
> NJ|Jersey City|Ave C|1196;1220;07305;R+1197;1221;07305;L|59603096|40.689638,-74
> .
> 099817 40.689915,-74.099587
> 
> According to the USPS, there is no street whose name begins with AVE
> in Jersey City.  The USPS shows an AVENUE C in Bayonne, with zip 07002
> for all addresses.  The addresses are in the range 1-1199, so the
> probability is high that they are the same addresses that Tiger has.
> 
> The other edges listed, which link together by lat/long, are
> even more troublesome, because they have a valid Jersey City ZIP,
> 07305, as well as the Jersey City place name.  It would be hard
> to determine, in isolation, a possible correlation between these
> addresses and the USPS addresses.  It would be unfortunate if
> many perfectly valid postal addresses couldn't be geocoded.
> I don't know how common the problem is, but, given the number
> of cases where the (unique) city that the USPS associates with
> a ZIP is turning up in multiple Tiger places, I fear it may
> involve a fairly large number of address ranges.
> 
> There are a couple rays of hope.  If you get the ZIP Code
> Tabulation Area shapefiles from
>   http://www.census.gov/geo/www/cob/z52000.html#shp
> and look up the lat/long, it assigns some of them to
> ZIP 07002, a Bayonne ZIP.  So we might be able to establish
> a tie-in in that way.  And, as in other postings to this
> thread, linking adjacent edges together can provide clues.
> At the head of the list above, we can add a edge 
> 
> NJ|Bayonne,Jersey City|Ave C||59603090|40.688075,-74.101122 
> 40.688284,-74.100948
> 
> yielding a chain that starts
> 
> NJ|Bayonne,Jersey City|Ave C||59603090|40.688075,-74.101122 
> 40.688284,-74.100948
> NJ|Jersey City|Ave C|1173;1179;07002;L|59603093|40.688284,-74.100948 
> 40.688560,-
> 74.100718
> NJ|Jersey City|Ave C|1181;1185;07305;L+1180;1186;07305;R|59603100|40.688560,-74
> .
> 
> The first link makes it clear that we are near the boundary of Jersey
> City and Bayonne, and the second has a Bayonne ZIP with the Jersey City
> place name, reinforcing the transitional nature.  If we get no USPS
> match for Jersey City/07305 (and we won't), it's reasonable to try
> Bayonne/07002, where a match will be found.  We could either correct
> the place and ZIP in the Tiger data, or at least fake an alternate
> place name and ZIP in addition to what is already there.  (We'd also
> want to add an alternate feature name, since the USPS calls the
> street AVENUE C, not AVE C.)
> 
> So, as always, I keep coming back to linking edges of the same
> street together, but "the same street" no longer need agree on
> ZIP or place name, and, at least for making inferences about
> dubious ZIPs and place names, and maybe only on part of the feature
> name, so SW Ave C could link up with NE Ave C or just Ave C.  -- jpl
> 
> 
> _______________________________________________
> Geodata mailing list
> Geodata at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/geodata



More information about the Geodata mailing list