Best way to structure large datasets

Tue Jan 10 06:13:40 PST 2006

Walter -

I will agree with Steve that you should use shapefiles with tileindexes
for your vector data.  Why does a database seem faster?  The value of a
database is that you can select data from it using various queries and
criteria, and it offers enormous flexibility.  If you are updating your
data frequently, then a database also gives you benefits in tracking and
managing versions and updates.  But there's no reason to think a
database does a better job of simply reading data from a disk.

I have one specific additional suggestions for you: use all the data you
read from disk - if you're not going to draw it, don't read it.

That means don't read a shapefile with 100,000 objects so you can draw
three of them.  That means you should try to break up your shapefiles
into geographically distinct subsets.  For US street data, for example,
data is often organized by counties.  That means you've chopped the
whole US into several thousand pieces, none of which overlap.  The
tileindex mechanism will compute the bounding boxes for each of these
counties (and those boxes will, of course, overlap).  When you request a
map image, the tileindex will quickly tell MapServer which county source
files could possibly be involved in responding to that request.
MapServer will then open those files and read the data.  This means you
should not only use tileindexes but you also need to be sure your data
is organized in a way that lets the tileindex mechanism work well.
Putting all your data in one shapefile with a tileindex wrapped around
it is useless; putting all your data into 100,000 files is equally
useless if you find you need to open 400 files to draw one map.

In an ideal tileindex scenario, each map request would open exactly one
shapefile (identified by the tileindex) and would draw every object in
that file.  That never really happens, but that's the design goal you
should keep in mind.

And always use shptree to create spatial indexes for both your
shapefiles and for the tileindex (which is another shapefile).

	- Ed

Ed McNierney
President and Chief Mapmaker
TopoZone.com / Maps a la carte, Inc.
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
Phone: +1 (978) 251-4242
Fax: +1 (978) 251-1396
ed at topozone.com

-----Original Message-----
From: UMN MapServer Users List [mailto:MAPSERVER-USERS at LISTS.UMN.EDU] On
Behalf Of Walter Anderson
Sent: Tuesday, January 10, 2006 8:29 AM
To: MAPSERVER-USERS at LISTS.UMN.EDU
Subject: Re: [UMN_MAPSERVER-USERS] Best way to structure large datasets

Thanks for the replies. It sounds like preprocessing the data and using
tiled shapefiles is the way to go.

Just seemed like a database with spatial indexes would be faster.  Oh
well, live and learn.

Thanks again for the assistance.

Walter Anderson