[Mapserver-users] Large Raster files

Ed McNierney ed at topozone.com
Mon May 17 12:21:55 EDT 2004

Charlie -

Well, I *am* Ed!  You've already gotten a lot of good comments, and
rather than try to respond to them specifically I'll just offer my own.

The first thing I'd consider is the data maintenance aspect.  Is this a
static library of images that will never change or be updated?  If so,
then you should feel free to do lots of data modification and
pre-processing.  If you need to update the data often, however, you
should consider how hard it will be to modify and pre-process the
updates.  It may be wise for you to store your data in a format that
closely resembles the originals in order to better manage updates.

Seek times on disk drives are much slower than read/data transfer times.
When you read a small portion of a large file, you typically have a good
bit of sequential reading with very few disk seeks.  Reading a small
portion of each of several small files may require much less sequential
reading but more disk seeks to read several files.  Increasing the
number of disk seeks required to produce an image will almost certainly
erase any benefit from shorter reads.

As a result, consider the largest size image you're going to display.
If your data is organized in image files that are substantially larger
than the output image (2X or more in each dimension) then a reasonable
fraction of map-serving requests will only need to read data from one
image file.  This is a Good Thing.  A request at the corner of four
images will be slower due to the need to seek to and read each of those
four images; if your image tiles are smaller than your output image
area, you're going to suffer from the need to read many small files for
every request.

Remember that a TILEINDEX benefits you to the extent that it permits you
to quickly ignore source images that cannot possibly be needed for the
output image.  If you end up creating a situation in which a substantial
fraction of the source images are needed for each output image anyway,
you will lose some of the benefits of a TILEINDEX.

Beware of compression schemes, and be sure you understand how they work.
Compression is very useful when you're able to give up some performance
to reduce storage costs, but be sure you need to make that sacrifice.

For my largest data set (the USGS DOQQ library - about 18 terabytes) I
store the data in uncompressed GeoTIFFs that are absolutely unmodified
from their original format.  TIFF compression schemes such as PackBits
and LZW do not work well on photographic images, so there's no point in
using them.  Lossy compression techniques (JPEG, MrSID, ECW) are
computationally intensive to decode, which is a liability if you're
trying to support many simultaneous users; I am also unwilling to suffer
any degradation of my imagery - I need to assure my customers that
they're getting the original USGS pixels.  If you don't need to support
many users but rather want to optimize for single-user retrieval, then
some of my objections to compression go away.

These DOQQs are roughly 6,000 x 7,000 pixels each, and are either 50MB
or 150MB depending on whether they're black-and-white images or color
infrared.  For interactive viewing on TopoZone, map image sizes range
from 600 x 400 to 1,200 x 800 pixels, so the vast majority of image
requests only hit one source DOQQ file.  I could probably chop up each
DOQQ into four tiles and get slightly better performance, but the
additional maintenance required isn't worth it.  That's because I've got
over 256,000 of them to keep track of; if you have a few files this
might not be bad.

As I mentioned, reading sequential data from a single file is about the
fastest thing you can do with a disk drive.  If your server has a lot of
RAM, then this data will be read and cached pretty well even without any
work on your part.

Keep things simple, and don't presume that complex schemes are
inherently faster than the straightforward solution.

	- Ed

Ed McNierney
President and Chief Mapmaker
TopoZone.com / Maps a la carte, Inc.
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
ed at topozone.com
(978) 251-4242  

-----Original Message-----
From: IMD Listuser [mailto:imd_listuser at comcast.net] 
Sent: Monday, May 17, 2004 7:42 AM
To: mapserver-users at lists.gis.umn.edu
Subject: [Mapserver-users] Large Raster files

Hello all

I have several large (250MB) images that I would like to serve using
mapserver. They are currently Geotiffs, so I would prefer to use them in
the same format. Nonetheless I would be happy to hear about the best
strategies for serving imagery.

In particular, what is the best method for splitting up such an image
into the smaller tiles that can be indexed using gdaltindex?


Charlie Van Dusen
IM Design

Mapserver-users mailing list
Mapserver-users at lists.gis.umn.edu

More information about the mapserver-users mailing list