[mapserver-users] Image Speed Questions---Generic

Mon Jul 30 13:09:43 PDT 2001

Lowell -

Since November, 1999, the TopoZone has been in the business of serving
"high-res imagery" "for a large spatial extent" by providing USGS DRG
(topographic) maps of the entire US online.  We're now migrating that
application to a MapServer platform so we can add some interesting stuff
to it.  I've been in and out of the PC graphics biz for about 20 years
now, and I've learned a bit about performance.  I wanted to take the
time to write a thorough commentary on raster tips, but that's going to
take me a while.  Here's a shorter version along with a promise for a
more extensive treatment someday as soon as I can.

For computer graphics in general I like to live by two golden rules -
(1) precompute the hell out of everything, and (2) trivially reject as
much as you can as fast as you can.  Here are some notes on both rules.

1. Precompute everything - if you're trying to serve a lot of users on
the Web with relatively inexpensive hardware, you'll find that a fast
RAID-5 SCSI array is much cheaper than lots of RAM and CPU.  Wherever
it's possible, think hard about how you can precompute raster results
rather than doing them on the fly.  Look at a simple case - zooming out
from a raster image that's viewed at 100% (1:1) zoom.  When you zoom out
4x, you're reading 16 times more input pixels than you are writing
output pixels.  At 16x it's 256 times more; you're basically reading a
lot of data and throwing 99.5% of it away.  For the TopoZone site we
chose to offer four fixed zoom levels - period.  They are all
precomputed and there is no on-the-fly scaling.  This is a more
reasonable approach for topo maps, because they're useful (i.e.
readable) only at a relatively small range of scales.  For aerial or
satellite imagery, or other continuous-tone imagery, you can zoom out
farther and still have things work.

Using GDAL support with overviews (built with GDALADDO) is a very easy
way to do this.  Precomputing zoom scales offers a tradeoff between disk
space and speed - the more space you use, the faster things will go.
However, the highest-level overviews require the smallest amount of
space but actually save the most time.  If you created only one 4x
overview, it would (roughly) increase your storage space by only about
7% (since the 4x view is 1/16th the size of the original) but for all
views zoomed out beyond 4x you're going to deal with 1/16th as many
pixels.  Creating lots of overviews (2, 4, 8, 16, 32, etc.) gives you
the best performance, but think very hard about whether you really need
that 2x overview.  Your mileage may vary based on usage, CPU speed, RAM,
disk performance, etc. so you really should test things yourself in your
real environment.

2. Trivially reject as much as you can - "trivial rejection" is the
process of very quickly deciding which input data is not needed in the
output map, and ignoring it as quickly as possible.  For raster data
sets, the way to do this is with a TILEINDEX shapefile.  By reading ONE
input fileset (the TILEINDEX shapefile), you can very quickly determine
exactly which TIFF input files are needed to create the requested map.
This is enormously faster than reading lots of TIFF/GeoTIFF files,
extracting the bounding box from each, and then discovering that you
never needed to open most of them in the first place.  Disk seek time is
expensive compared to sequential reading, so reading 100,000 bytes from
one file is much, much faster than reading 100 bytes from each of 1,000
files.

In addition to these general rules, make sure your images don't make
life difficult for MapServer.  For TIFF images, this particularly means
making sure they are created with only one (or a few) rows per strip.
Some image libraries create single-strip TIFF files, with all the rows
in one strip.  If you are using any sort of data compression (PackBits,
for example) MapServer will need to read and interpret (i.e. decompress)
all the data in that strip up to the point you need.  For example, if
you've got a 7,000-row file in one strip, and you need to display the
bottom few rows of that file, MapServer will need to read from disk AND
decompress essentially the entire file just to get a few lines from it.
If you have one row per strip, you will get less effective compression
(because each row is compressed independently) but you will only need to
read the rows you need.  A compromise with a small number of rows per
strip might give you better compression with acceptable performance.

Finally, know your data and your users.  Some data types compress better
than others.  Some are virtually incompressible (at least within the
world of formats TIFF supports).  Some data types are used differently
than others - as I suggested above, a 32x zoom out on a 1-meter aerial
photo is a somewhat reasonable thing to ask for, while the same zoom on
a topographic map produces illegible mush.

I'm still shuffling around some R&D equipment, but sometime this week
I'll post a link to a MapServer site that offers a 30-meter shaded
relief map of the entire US and 2.5-meter topographic maps of the entire
US.  Together they're over 60,000 files with about 300 GB of data, and
they illustrate some of the things I've talked about here.

	- Ed

Ed McNierney
Chief Mapmaker
TopoZone.com
ed at topozone.com
(978) 251-4242

-----Original Message-----
From: Ballard,Lowell [mailto:LBallard at YesVirginia.org]
Sent: Thursday, July 26, 2001 1:15 PM
To: 'mapserver-users at lists.gis.umn.edu'
Subject: [mapserver-users] Image Speed Questions---Generic

Say, for example, you needed to display/serve high-res imagery (on the
Internet) for a large spatial extent (e.g., a state or region).  There
would
be several ways to get this done.  

For example:

1. You could use multiple images and reference each as an individual
layer
(administrative nightmare).

2. You could use multiple images and get at them through an image
catalog.

3. You could mosaic them into one large image (could get REALLY big
fast--probably not feasible)

4.  If pyramid layers/aux/MrSid were supported you use them to display
moderate sized mosaics or original imagery (greatly reduce storage
footprint
but uncomressing hammers CPU cycles).

5. You could resample imagery at different resolutions (1m pixel
resolution;
5m; 10m) and reference each collection (e.g., 5m) with a different image
catalog depending on viewing scale (i.e., viewing at county-extent use
10m
catalog; city-level use 5m; subdivision use 1m).

6.  You store them all in SDE (~2:1 compression lossless).

7. About any combination of the above (e.g., resample imagery to 1m, 5m
etc
and create pyramids for those).

8. .......

I'm curious how others would accomplish/approach this task.  I can post
a
summary.

Thanks,

Lowell Ballard