[Benchmarking] Using Tiger 2008 data
Andrea Aime
aaime at opengeo.org
Sun Sep 27 06:08:16 EDT 2009
Hi,
I've been looking at bit at the Tiger 2008 data for
Texas that Jeff provided, here are some findings and impressions
on the merged set (the non merged set is only of interest of
MapServer I think).
EDGES_MERGE
----------------------------------------------------------------
The edges_merged.shp file contains both roads and water lines.
The classification we had for roads is still applicable with
the small changes Jeff already suggested on the wiki page here:
http://wiki.osgeo.org/wiki/Texas_roads_styled
One major difference between this set and the old one is that
it does not contain only roads, so the style will perform a
filtering on top of the data and display only roads (and only
certain road classes, not all of them), whilst the old
data set only had roads and displayed them all.
I guess this makes for an interesting comparison between
spatial database and shapefile (and will also point out
systems that are capable of indexing the attributes as well
in a shapefile, provided there are any... maybe ArcGis).
I guess we'll want to index the mtfcc attribute in PostGIS
to speed up searches.
POINTLM_MERGE
----------------------------------------------------------------
The pointlm_merge file contains point landmarks, and could
be used to replace the gnis_names layer.
I've tried to port the styling made by ESRI over to the
pointlm file, but doing so resulted in loosing half
of the categories.
I've also made a comparison of the data amounts and distributions,
see here:
select count(*), mtfcc from pointlm_merge group by mtfcc;
count | mtfcc
-------+-------
1 | C3071
68 | K3544
3 | K2181
3 | K2110
84 | K2165
1 | C3066
1 | K2182
214 | K1231
599 | K2543
8 | K1236
41 | K2561
11854 | C3061
152 | K2190
32 | K2582
100 | C3062
20 | K1225
257 | K2451
3 | K1237
select count(*) from pointlm_merge;
count
-------
13441
As you can see there are only 13441 points, and the vast
majority of them are C3061, "Cul de Sac". Won't make for
a very interesting map imho.
Compare with the gnis_map filtering over the data that
is available in Texas:
select count(*) from gnis_names_pg where state = 'TX';
count
-------
95132
select count(*) as cnt, class from gnis_names_pg where state = 'TX'
group by class order by cnt;
cnt | class
-------+-----------------------
1 | Crater
1 | Bench
2 | Slope
2 | Tunnel
5 | Arch
5 | Rapids
6 | Plain
9 | Forest
11 | Reserve
17 | Woods
19 | Arroyo
19 | Harbor
21 | Pillar
27 | Beach
27 | Area
29 | Falls
55 | Bar
56 | Mine
57 | Post Office
61 | Crossing
63 | Range
65 | Basin
67 | Military (Historical)
87 | Levee
110 | Ridge
130 | Bridge
146 | Channel
157 | Gap
180 | Flat
185 | Cliff
231 | Swamp
259 | Gut
259 | Bend
261 | Island
262 | Civil
279 | Cape
283 | Bay
299 | Canal
504 | Trail
579 | Hospital
942 | Well
1052 | Tower
1243 | Spring
1294 | Oilfield
1757 | Airport
1780 | Lake
2117 | Summit
2844 | Valley
3795 | Building
4008 | Park
5947 | Dam
6016 | Cemetery
7980 | Locale
8511 | Populated Place
8542 | Reservoir
8756 | School
11640 | Stream
12072 | Church
I would say this is much more interesting, and the
work to define a style for it has already been done.
I suggest we ignore pointlm_merge and keep on using gnis_names
instead.
If we really want to use contemporary data (the major reason why
Jeff gathered the TIGER 2008 set no?) we can have someone download and
convert the current GNIS names for Texas, available here
as a csv file: http://geonames.usgs.gov/domestic/download_data.htm
It has a bit more points (108k) but the classification appears to
be the same
AREAWATER_MERGE
----------------
The file contains water polygons (lakes and such), it's quite sparse
and has 368303 polygons over the Texas state.
It seems to make a nice replacement for the tiger_tracts dataset,
which is nation wide but contains only 4388 polygons in Texas.
I suggest we use the areawater data set for the polygon test,
using a uniform bluish fill color with no outline?
OTHER FILES
-----------
arealm_merge.shp is another polygon file, but has few polygons inside.
tl_2008_48_place.shp is a point file, not so bit, and to my surprise
it cannot be imported into PostGIS using shp2pgsql (charset issues).
I guess we can safely ignore these two?
Cheers
Andrea
--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.
More information about the Benchmarking
mailing list