Speeding up load time for huge SHP files

Ed McNierney ed at TOPOZONE.COM
Fri Aug 17 21:37:24 EDT 2007


Mevima -

Although it may take some searching, there has been a lot of discussion
of this sort of topic on the list, so hunting in the archives could be
very worthwhile.

There are several issues and solutions to think about here; I'll try to
run from quick fixes to more complex ones.

While optimizing things, it's very, very helpful to imagine an ideal
world in which each map request required you to open exactly one
shapefile, you would read and display ALL the shapes in that shapefile,
and you would not draw any pixel in the output image more than once.
That never really happens, but it describes the optimal scenario and you
should strive to approach it as much as possible.

1. Are you using the spatial indexes created with the SHPTREE command on
your shapefiles?  If not, do so.  They will (almost) always be quite
helpful.  They are especially helpful when you have a large file from
which you are displaying a small portion of the data.  Without a spatial
index MapServer needs to indeed read every record in the shapefile in
order to find out whether any of them intersect the area to be drawn.  A
spatial index tracks the bounding box of each object and allows
MapServer to quickly identify only those records whose bounding boxes
intersect the area to be drawn.  This is a very fast test, and it
produces some false positives (some selected objects end up not being
drawn anyway) but is extremely helpful unless you're essentially drawing
every object in the shapefile anyway, in which case the clever selection
of which objects to draw is of little use.

2. Divide the single shapefile into smaller shapefiles using SHP2TILE,
then use a TILEINDEX to logically group them together.  This will allow
you to approach the goal of opening only one file and using all the
objects in it.  If you're drawing a map of Clarke County, Georgia you
only need a shapefile with all the roads in Clarke County in it.  Even
if you tune things pretty well, a shapefile with every road in the
country will take longer to process and select Clarke County roads from
than a shapefile that only has Clarke County roads.  Random seeks from
one place on the disk to another (to read another file or a different
portion of the same file) are the very slowest things your computer
does.  Avoid them.

3. You cannot possibly need "all the data" for a single map image
request.  The first two steps above should help you more quickly select
which portion of the data you do need.  If you end up finding that you
are still processing and rendering a large subset of the data, then you
should consider generalizing your shapefiles to simplify the geometry of
the objects in them.  If you're drawing a map that's 1,200 by 800
pixels, that's about 1 megapixel.  You can't productively use 2,500
megabytes of data to produce 1 megapixel of output.  But if you have,
for example, a very highly detailed outline of, for example, the US
coastline, you may find that to be excellent for large-scale, zoomed-in
maps.  But if you use that same shapefile to draw a small-scale view of
the entire country, you will find that most of your vectors are far
smaller than a pixel; you will end up drawing the same pixel over and
over again, rendering teeny little sub-pixel vectors.  If that's your
problem, you need a different, simplified version of that shapefile
(using MINSCALE/MAXSCALE to select the right one) for use at small
scales.

The capitalized keywords above are good search terms for the archives.
If you can think about these issues and then describe your shapefile
structure in a bit more detail we can probably be more helpful.

	- Ed

Ed McNierney
Chief Mapmaker
Demand Media / TopoZone.com
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
Phone: 978-251-4242, Fax: 978-251-1396
ed at topozone.com



-----Original Message-----
From: UMN MapServer Users List [mailto:MAPSERVER-USERS at LISTS.UMN.EDU] On
Behalf Of Mevima Winn
Sent: Friday, August 17, 2007 6:30 PM
To: MAPSERVER-USERS at LISTS.UMN.EDU
Subject: [UMN_MAPSERVER-USERS] Speeding up load time for huge SHP files

I'm currently building a mapping application off C#, MapServer 4.10, and
approximately 45 SHP files.  Many of them are pretty fast to load,
especially utilizing the MINSCALE/MAXSCALE function, but once I start
zooming in to where US roads and tiny lakes start appearing, the app
slows
WAY down.  Our theory here is that it's either trying to load the whole
shapefile and then zooming in on one spot, or reloading the shapefile
every
time there is a new request.

So does anybody have an idea what might be happening?  The 350MB SHPfile
tends to take 20 seconds or so to load, whereas the 2.5GB SHPfile can
take
several minutes.  Is there a way around this load time, or a way to
optimize
it?  We kind of need all of the data, at least at these particular
zooms.

____________
Mevima Winn
Wireless Applications Corp.
111 108th Ave. NE.
Suite 160
Bellevue, WA 98004
*paulanne.winn at wacorp.net*
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.484 / Virus Database: 269.12.0/957 - Release Date:
8/16/2007
1:46 PM



More information about the mapserver-users mailing list