[mapserver-users] Displaying very large shapefiles

Wed Oct 17 13:53:46 PDT 2001

Armin -

There are lots of factors that contribute to performance.  Since I don't
know anything about your data, here are some general things to think
about:

1. Minimize the amount you have to draw

The entire point of indexing and tiling data is to make it easy to
quickly identify that subset of the data that needs to be drawn on the
map.  If you end up drawing all the data anyway, forget about tiling and
indexing it and just buy a faster computer.  Be sure your data is
organized into layers that are displayed at appropriate scales.  How
many polygons from your data set are being displayed at once?

ADVICE: Be sure to set MINSCALE and MAXSCALE values on your layers.  If
necessary, pre-classify your data into several different shapefiles so
you can turn on your data incrementally.

2. Indexing works on bounding boxes

You may still have to process lots of data if the indexing process still
indicates that there is a large number of candidate objects that might
need to be drawn.  Imagine, for example, a data set that consists of a
large number of concentric circles (polylines, not polygons).  Now
imagine a map being drawn that is a square lying INSIDE the smallest of
those concentric circles.  The map will actually be blank, since none of
the polylines intersect it, but the bounding boxes of ALL the circles
will intersect the map area, so each polyline shape ends up being a
candidate for drawing.  You can't tell whether or not the polyline
actually intersects the map area of interest until you walk every vertex
in the polyline.

ADVICE: Look at your shapes.  Can you break large polygons into smaller,
contiguous polygons?

3. Tiling works on bounding boxes

Use TILEINDEX wherever possible.  Break your data set up into
rectangular tiles that, as much as possible, don't overlap.  That way
you can avoid entire shapefiles all at once.  For example, I'm
developing an application using GDT's Dynamap/Display street database.
Originally I had the data tiled by state (i.e. all the local roads in
Colorado were in one shapefile); the performance wasn't too bad, but it
wasn't as good as it could be.  I used a TILEINDEX scheme so a request
for roads at, say, the Four Corners area would search through the
(indexed) shapefiles for local roads in Colorado, Utah, New Mexico, and
Arizona.  I then switched to using the data tiled by county instead of
by state.  Each shapefile then represented one county's worth of local
roads, so I only needed to look through the (indexed) shapefiles for San
Juan County, UT, Montezuma County, CO, San Juan County, NM, and Apache
County, AZ.  The result was MUCH faster, especially for the first query
where the data files weren't in the filesystem cache.

ADVICE: Try to tile your data into shapefiles intelligently.

4. Look at the hardware

This is relatively minor, but make sure your disks are defragmented,
you've got lots of RAM for your filesystem to cache things, etc.

ADVICE: Don't tie one arm behind your back.  Make sure the hardware and
OS are well-tuned.

There are really only two simple rules here.  Every map you make will
only display a very small fraction of your database.  Rule #1: Trivial
rejection - figure out how to completely ignore the vast majority of
your data that just doesn't matter to the current map.  Rule #2: Data
organization - make sure the small piece of the data you DO need is
easily, quickly, and efficiently accessible.

	- Ed

Ed McNierney
Chief Mapmaker
TopoZone.com
ed at topozone.com
(978) 251-4242

-----Original Message-----
From: Armin Burger [mailto:armin.burger at territoriumonline.com]
Sent: Wednesday, October 17, 2001 4:09 PM
To: mapserver-users at lists.gis.umn.edu
Subject: RE: [mapserver-users] Displaying very large shapefiles

Puneet,

I tried shptree with parameters from 4 to 10, the effect was always the
same. The response was quite slow. The index file for a 600000 polygon
file
with 180 MB I use has only 2 MB, so I think the index size for the huge
file
is in a normal range.

I'm only wondering why an increase in file size and polygon number of
about
4 times increases the processing time up to 50 times or more. The
difference
between displaying a 600000 and 50000 polygon shapefile however (with
appropriate zoom level) is neglibile. I don't know if the indexing
reaches
its limits when the amount of shapes is too high.

Thanks

Armin

------ Original Message ------
 From Puneet Kishor  <pkishor at GeoAnalytics.com>
 Sent: 16/10/01, 10:23:06
 Subject: RE: [mapserver-users] Displaying very large shapefiles

> Armin,

> Just a quick observation...

>> -----Original Message-----
>> From: Armin Burger [mailto:armin.burger at territoriumonline.com]
>> Sent: Tuesday, October 16, 2001 5:47 AM
>> To: mapserver-users at lists.gis.umn.edu
>> Subject: [mapserver-users] Displaying very large shapefiles
>>
>>
>> Hi everybody,
>>
>> I tried to display a really huge shapefile with 2.5 million
>> polygons (the
>> .shp file has about 450 MB). I used the 'shptree' to
>> calculate the spatial
>> index. But also with the index file (.qix) it takes several minutes
to

> [snip]

>> handle shapefiles with too many features? The index file
>> itself has 14 MB.
>>

> a 14 Mb index file for a 450 Mb shapefile seems to be too small an
index
> file. For example, I have a 156 Mb shapefile (state and county roads).
It
> was slow as a dog. Then I indexed it and shptree created a 38 Mb index
file.
> It is now blazingly fast.

> Try reindexing with different parameters... maybe the indexing is not
> optimal at all.

> You are right. Nothing should take "minutes" in today's web age. If it
does,
> you have to either improve it or do it some other way.

> pk/