[Geodata] [Tiger] A few interesting observations on the Tiger2007fedata

John P. Linderman jpl at research.att.com
Thu Aug 7 13:51:16 EDT 2008


Well, a little *good* news for a change.  Instead of just noting all
the different combinations of ZIP, city and state, I counted them.
Here's the start of the New Jersey summary, by ZIP (counts first)

 905 07001|Avenel|NJ
   1 07001|Iselin|NJ
1804 07002|Bayonne|NJ
   3 07002|Jersey City|NJ

and here's the Peapack and Gladstone occurrences, grouped together
(with USPS city key and name tacked on).

   1 07930|Peapack and Gladstone|NJ	V10250NJCHESTER
  93 07931|Peapack and Gladstone|NJ	V10462NJFAR HILLS
 185 07934|Peapack and Gladstone|NJ	V10532NJGLADSTONE
   3 07940|Peapack and Gladstone|NJ	V10818NJMADISON
  59 07977|Peapack and Gladstone|NJ	V11167NJPEAPACK

From

1804 07002|Bayonne|NJ
   3 07002|Jersey City|NJ

I am tempted to conclude
1) ZIP 07002 is a Bayonne ZIP, and any USPS city names associated with
   07002 should also be associated with Bayonne.
2) ZIP 07002 is not a valid Jersey City ZIP, and we should go back
   and attempt to clean up those edges where it appears with Jersey City.
3) If we look to the USPS data to help in that cleanup, any cleanup
   involving Jersey City should also check Bayonne if no Jersey City
   match is found.

In my opinion, this nicely addresses several problems I raised in
  http://lists.osgeo.org/pipermail/geodata/2008-August/000751.html

The conclusions from

   1 07930|Peapack and Gladstone|NJ	V10250NJCHESTER
  93 07931|Peapack and Gladstone|NJ	V10462NJFAR HILLS
 185 07934|Peapack and Gladstone|NJ	V10532NJGLADSTONE
   3 07940|Peapack and Gladstone|NJ	V10818NJMADISON
  59 07977|Peapack and Gladstone|NJ	V11167NJPEAPACK

are not quite so clear.  ZIPs 07930 and 07940 are pretty
obviously bogus, and in need of repair like the Jersey City/
07002 edges.  The USPS itself has no city name aliases for
PEAPACK and GLADSTONE, but the USGS files have already
provided PEAPACK AND GLADSTONE aliases for both.  But there
are more FAR HILLS address ranges in Peapack and Gladstone
than PEAPACK ranges.  If we just go by the numbers, and not
by the names, FAR HILLS is a better alias than PEAPACK.
I had no qualms about adding the PEAPACK AND GLADSTONE
alias to the USPS names, but I'm not as comfortable adding
FAR HILLS... if I get an address match in PEAPACK and
FAR HILLS, for example, I'd want to penalize any match
that relied on the alias.

I'm just starting to look at the results.  Maybe some combination
of the by-ZIP and by-name results will produce better aliases than
either considered separately.  But I'm delighted that there's a
way to identify edges with probable errors in address range or
place name (or both).  -- jpl




More information about the Geodata mailing list