[Benchmarking] Using Tiger 2008 data
    Yingqi Tang 
    ytang at esri.com
       
    Mon Sep 28 11:15:39 EDT 2009
    
    
  
Andrea, Jeff,
At this point we're only looking at testing against merged dataset because serving multiple shapefiles (non-merged dataset) as one WMS layer is not a supported work flow in ArcGIS Server.
Generally I agree with what has been proposed by Andrea, and here are our comments:
	1. EDGES_MERGE
		
		We totally agree on using edges_merged.shp, and we also support index attribute (MTFCC), in which a small index must be created and added 		along with shapefile before benchmark (I will upload those)
	2. POINTLM_MERGE
		We should use gnis_names.shp instead of pointlm_merge.shp because gnis_names.shp has a more meaningful "type" attribute which will be easier 
		To style.
		Jeff: I've uploaded a draft SLD for gnis_names.shp at 64.222.187.168:2221/opt/benchmarking/esri/sld, and all the icons referenced in that SLD 		are in /opt/benchmarking/esri/icons (bmp, png and gif for each icon)
	3. AREAWATER_MERGE
		Agree on using areawater_merge to replace tiger_tracts.shp
	4. OTHER FILES
		Agree on not using them in benchmark.
Thanks,
Yingqi
-----Original Message-----
From: benchmarking-bounces at lists.osgeo.org [mailto:benchmarking-bounces at lists.osgeo.org] On Behalf Of Jeff McKenna
Sent: Sunday, September 27, 2009 1:09 PM
To: Performance testing of OSGeo and other web service engines.
Subject: Re: [Benchmarking] Using Tiger 2008 data
Hi Andrea,
Thanks for your thorough review of the TIGER 2008 merged dataset.  My 
comments are inline below:
Andrea Aime wrote:
> Hi,
> I've been looking at bit at the Tiger 2008 data for
> Texas that Jeff provided, here are some findings and impressions
> on the merged set (the non merged set is only of interest of
> MapServer I think).
true the non-merged set is only of interest to MapServer right now 
(we'll test both for MapServer, in the hopes that the numbers show 
MapServer users something)
> 
> EDGES_MERGE
> ----------------------------------------------------------------
> 
> The edges_merged.shp file contains both roads and water lines.
> The classification we had for roads is still applicable with
> the small changes Jeff already suggested on the wiki page here:
> http://wiki.osgeo.org/wiki/Texas_roads_styled
> One major difference between this set and the old one is that
> it does not contain only roads, so the style will perform a
> filtering on top of the data and display only roads (and only
> certain road classes, not all of them), whilst the old
> data set only had roads and displayed them all.
Yes this 'filtering' will be a great test of the response time from the 
mapping servers.  Usually I would pre-process the data so that each type 
of road is its own shapefile (e.g. interstates.shp, major-roads.shp), 
but I'm curious to see how we all do with having to filter these 
on-the-fly.
> I guess this makes for an interesting comparison between
> spatial database and shapefile (and will also point out
> systems that are capable of indexing the attributes as well
> in a shapefile, provided there are any... maybe ArcGis).
> I guess we'll want to index the mtfcc attribute in PostGIS
> to speed up searches.
> 
For the MapServer case, OGR does support attribute indexing for 
shapefiles ('ogrinfo -sql "CREATE INDEX ON edges_merge USING MTFCC" 
using edges_merge.shp'), but I believe that that attribute index would 
only help if we were using MapServer to query that 'MTFCC' field (Daniel 
am I correct on this?)
> POINTLM_MERGE
> ----------------------------------------------------------------
> 
> The pointlm_merge file contains point landmarks, and could
> be used to replace the gnis_names layer.
> I've tried to port the styling made by ESRI over to the
> pointlm file, but doing so resulted in loosing half
> of the categories.
Are the ESRI gnis_names stylings (class, icon) posted on the wiki somewhere?
> I've also made a comparison of the data amounts and distributions,
> see here:
> 
> select count(*), mtfcc from pointlm_merge group by mtfcc;
>  count | mtfcc
> -------+-------
>      1 | C3071
>     68 | K3544
>      3 | K2181
>      3 | K2110
>     84 | K2165
>      1 | C3066
>      1 | K2182
>    214 | K1231
>    599 | K2543
>      8 | K1236
>     41 | K2561
>  11854 | C3061
>    152 | K2190
>     32 | K2582
>    100 | C3062
>     20 | K1225
>    257 | K2451
>      3 | K1237
> 
> 
> select count(*) from pointlm_merge;
>  count
> -------
>  13441
> 
> As you can see there are only 13441 points, and the vast
> majority of them are C3061, "Cul de Sac". Won't make for
> a very interesting map imho.
> 
> Compare with the gnis_map filtering over the data that
> is available in Texas:
> 
> select count(*) from gnis_names_pg where state = 'TX';
> count
> -------
>  95132
> 
> select count(*) as cnt, class from gnis_names_pg where state = 'TX' 
> group by class order by cnt;
> 
>   cnt  |         class
> -------+-----------------------
>      1 | Crater
>      1 | Bench
>      2 | Slope
>      2 | Tunnel
>      5 | Arch
>      5 | Rapids
>      6 | Plain
>      9 | Forest
>     11 | Reserve
>     17 | Woods
>     19 | Arroyo
>     19 | Harbor
>     21 | Pillar
>     27 | Beach
>     27 | Area
>     29 | Falls
>     55 | Bar
>     56 | Mine
>     57 | Post Office
>     61 | Crossing
>     63 | Range
>     65 | Basin
>     67 | Military (Historical)
>     87 | Levee
>    110 | Ridge
>    130 | Bridge
>    146 | Channel
>    157 | Gap
>    180 | Flat
>    185 | Cliff
>    231 | Swamp
>    259 | Gut
>    259 | Bend
>    261 | Island
>    262 | Civil
>    279 | Cape
>    283 | Bay
>    299 | Canal
>    504 | Trail
>    579 | Hospital
>    942 | Well
>   1052 | Tower
>   1243 | Spring
>   1294 | Oilfield
>   1757 | Airport
>   1780 | Lake
>   2117 | Summit
>   2844 | Valley
>   3795 | Building
>   4008 | Park
>   5947 | Dam
>   6016 | Cemetery
>   7980 | Locale
>   8511 | Populated Place
>   8542 | Reservoir
>   8756 | School
>  11640 | Stream
>  12072 | Church
> 
> I would say this is much more interesting, and the
> work to define a style for it has already been done.
> I suggest we ignore pointlm_merge and keep on using gnis_names
> instead.
Great comparison, yes I agree that we should ignore 'pointlm_merge.shp'
> 
> If we really want to use contemporary data (the major reason why
> Jeff gathered the TIGER 2008 set no?) we can have someone download and
> convert the current GNIS names for Texas, available here
> as a csv file: http://geonames.usgs.gov/domestic/download_data.htm
> It has a bit more points (108k) but the classification appears to
> be the same
Agreed, I've uploaded a processed file for 2009 
(/opt/data/GNIS-2009/gnis_names09.shp), and I've updated the wiki.
I've also started a file in SVN to record data sources 
(/benchmarking/docs/data-sources.txt).  Can someone who knows the 
sources of the Raster data please update this file in SVN?  thanks.
> 
> AREAWATER_MERGE
> ----------------
> 
> The file contains water polygons (lakes and such), it's quite sparse
> and has 368303 polygons over the Texas state.
> It seems to make a nice replacement for the tiger_tracts dataset,
> which is nation wide but contains only 4388 polygons in Texas.
> 
> I suggest we use the areawater data set for the polygon test,
> using a uniform bluish fill color with no outline?
agreed.
> 
> OTHER FILES
> -----------
> 
> arealm_merge.shp is another polygon file, but has few polygons inside.
> tl_2008_48_place.shp is a point file, not so bit, and to my surprise
> it cannot be imported into PostGIS using shp2pgsql (charset issues).
it probably requires the "-W latin1" switch, which the geonames file 
also required to import into PostGIS (lesson learned here!). 
tl_2008_48_place.shp is really just an "urban areas" polygon file.
> 
> I guess we can safely ignore these two?
Sure, sounds good.
-jeff
_______________________________________________
Benchmarking mailing list
Benchmarking at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/benchmarking
    
    
More information about the Benchmarking
mailing list