[Benchmarking] Using Tiger 2008 data

Andrea Aime aaime at opengeo.org
Sun Sep 27 06:08:16 EDT 2009


Hi,
I've been looking at bit at the Tiger 2008 data for
Texas that Jeff provided, here are some findings and impressions
on the merged set (the non merged set is only of interest of
MapServer I think).

EDGES_MERGE
----------------------------------------------------------------

The edges_merged.shp file contains both roads and water lines.
The classification we had for roads is still applicable with
the small changes Jeff already suggested on the wiki page here:
http://wiki.osgeo.org/wiki/Texas_roads_styled
One major difference between this set and the old one is that
it does not contain only roads, so the style will perform a
filtering on top of the data and display only roads (and only
certain road classes, not all of them), whilst the old
data set only had roads and displayed them all.
I guess this makes for an interesting comparison between
spatial database and shapefile (and will also point out
systems that are capable of indexing the attributes as well
in a shapefile, provided there are any... maybe ArcGis).
I guess we'll want to index the mtfcc attribute in PostGIS
to speed up searches.

POINTLM_MERGE
----------------------------------------------------------------

The pointlm_merge file contains point landmarks, and could
be used to replace the gnis_names layer.
I've tried to port the styling made by ESRI over to the
pointlm file, but doing so resulted in loosing half
of the categories.
I've also made a comparison of the data amounts and distributions,
see here:

select count(*), mtfcc from pointlm_merge group by mtfcc;
  count | mtfcc
-------+-------
      1 | C3071
     68 | K3544
      3 | K2181
      3 | K2110
     84 | K2165
      1 | C3066
      1 | K2182
    214 | K1231
    599 | K2543
      8 | K1236
     41 | K2561
  11854 | C3061
    152 | K2190
     32 | K2582
    100 | C3062
     20 | K1225
    257 | K2451
      3 | K1237


select count(*) from pointlm_merge;
  count
-------
  13441

As you can see there are only 13441 points, and the vast
majority of them are C3061, "Cul de Sac". Won't make for
a very interesting map imho.

Compare with the gnis_map filtering over the data that
is available in Texas:

select count(*) from gnis_names_pg where state = 'TX';
count
-------
  95132

select count(*) as cnt, class from gnis_names_pg where state = 'TX' 
group by class order by cnt;

   cnt  |         class
-------+-----------------------
      1 | Crater
      1 | Bench
      2 | Slope
      2 | Tunnel
      5 | Arch
      5 | Rapids
      6 | Plain
      9 | Forest
     11 | Reserve
     17 | Woods
     19 | Arroyo
     19 | Harbor
     21 | Pillar
     27 | Beach
     27 | Area
     29 | Falls
     55 | Bar
     56 | Mine
     57 | Post Office
     61 | Crossing
     63 | Range
     65 | Basin
     67 | Military (Historical)
     87 | Levee
    110 | Ridge
    130 | Bridge
    146 | Channel
    157 | Gap
    180 | Flat
    185 | Cliff
    231 | Swamp
    259 | Gut
    259 | Bend
    261 | Island
    262 | Civil
    279 | Cape
    283 | Bay
    299 | Canal
    504 | Trail
    579 | Hospital
    942 | Well
   1052 | Tower
   1243 | Spring
   1294 | Oilfield
   1757 | Airport
   1780 | Lake
   2117 | Summit
   2844 | Valley
   3795 | Building
   4008 | Park
   5947 | Dam
   6016 | Cemetery
   7980 | Locale
   8511 | Populated Place
   8542 | Reservoir
   8756 | School
  11640 | Stream
  12072 | Church

I would say this is much more interesting, and the
work to define a style for it has already been done.
I suggest we ignore pointlm_merge and keep on using gnis_names
instead.

If we really want to use contemporary data (the major reason why
Jeff gathered the TIGER 2008 set no?) we can have someone download and
convert the current GNIS names for Texas, available here
as a csv file: http://geonames.usgs.gov/domestic/download_data.htm
It has a bit more points (108k) but the classification appears to
be the same

AREAWATER_MERGE
----------------

The file contains water polygons (lakes and such), it's quite sparse
and has 368303 polygons over the Texas state.
It seems to make a nice replacement for the tiger_tracts dataset,
which is nation wide but contains only 4388 polygons in Texas.

I suggest we use the areawater data set for the polygon test,
using a uniform bluish fill color with no outline?

OTHER FILES
-----------

arealm_merge.shp is another polygon file, but has few polygons inside.
tl_2008_48_place.shp is a point file, not so bit, and to my surprise
it cannot be imported into PostGIS using shp2pgsql (charset issues).

I guess we can safely ignore these two?

Cheers
Andrea


-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.


More information about the Benchmarking mailing list