shapefile optimization for dynamic data

Fri Apr 21 13:49:11 PDT 2006

> Ben Eisenbraun wrote:
>> I'm collecting data via a GPS and a sensor that reports a data
>> point once per second. I'm using Mapserver CGI to generate an
>> overlay onto a map via a javascript frontend that auto-refreshes
>> every few seconds. The application has to run on a low-power
>> embedded hardware device (roughly a p2-266), and I'm running into
>> performance problems once I've collected a few thousand data
>> points. The Mapserver CGI process tends to consume all the CPU
>> trying to render the overlays.
<snip my previous post>
> Stephen Woodbridge wrote:
> Using shptree will not help you that much in this scenario because of
> the frequency of updating the file. You best bet would be use multiple
> files and a tile index that you would have to add the new files to as
> they are created. Then you can shptree on the non-active file, but not
> on the active file. That will probably be the best scenario. Also make
> sure you shptree the tileindex.

A little follow up:

I tried this route.  I wrote the shapefile generation scripts so that
you could set a max-points per shapefile, and the system would create a
shapefile of, e.g., 1000 points, shptree it, add it to the tile index,
shptree the tile index, and then create a new unindexed shapefile for
adding the next batch of 1000 points.

It had basically no effect.  I tested a range of shapefile sizes from
5000 down to 50 points per shapefile with almost identical performance
at all sizes for 20,000 total points.  Testing with shptree indexes
versus no indexes was almost identical.  At the larger sizes, I saw
approximately 3-5% decrease in rendering time.

So... yuck.

Given how strongly recommended shptree indexes are on the list, I
thought my testing methodology might be flawed.

I have a list of URLs that represent tiles for the entire dataset as
well a list of URLs that are for common views of the data (zoom level
and number of points) that I used for all my tests.  I was using shell
scripts using 'time' and 'curl' to grab images across the network, and I
thought it might be a network or HTTP effect, but I was able to use
'shp2img' to duplicate the results locally.

> If a shapefile does not have a qix spatial index, then mapserver creates
> one on the fly and throws it away. If you are adding a point a second
> the file is probably getting updated faster than you can index it and
> then render it. Using the tileindex should really help in this case
> also, because only the files the intersect you current display window
> need to be opened and looked at.

I think my shapefile sizes must be significantly smaller than the data
that most people are using.  20,000 points ends up being about 1.5 MB of
shapefiles with slightly larger dbf files for the attributes.  I'm
reading/writing these files to a ramdisk under Linux, so access should
be pretty quick.

My suspicion at this point is that the CPU is simply under specced for
this application.  It's not really a Pentium II; it's a low power
586-class Geode CPU with no level 2 cache.

The biggest performance increase was achieved by breaking out the points
into shapefiles based on their attributes.  I was previously creating a
single layer in the mapfile and using CLASS expressions to colorize the
features.  By pre-classifying the data into separate shapefiles, I was
able to decrease rendering time by 10-12%.

I never did end up checking out PostGIS or sqlite for this application,
but I'm not sure it would have helped.  The shapefile creation and
updates are actually relatively low overhead in comparison to generating
the overlays.

Thanks for everyone's suggestions.

-ben

--
this machine kills fascists.                           <woody guthrie>