[Mapserver-users] Double indexing?

Mon Jul 21 12:02:29 PDT 2003

Steve -

To optimize performance within a TIFF, you need to look at the data organization within that file.

TIFFs are made up of one or more strips, and each strip is made up of one or more rows.  The total number of "lines" in the image = rows * strips.  The TIFF header contains a table of pointers to the start of each strip, which is essentially the second "index" you're looking for.

For uncompressed TIFFs, the files are large but very easy to access.  Since each scan line takes up a known, predefined amount of disk space, the TIFF library can easily calculate the location of any row in the file and move the file pointer right to it (and start reading).  Breaking these up into multiple strips does no good, because any pixel can be retrieved directly without using a second index.

For compressed TIFFs it's a little trickier.  The two common TIFF compression algorithms, PackBits and LZW, both require each block of compressed data to be uncompressed (decoded) starting from the very beginning to access any bit of the data (well, you don't really have to uncompress it but you need to read and parse it).  This is where your file organization comes in to play.

TIFF compression always compresses each "strip" of data as a single chunk.  If you have a 10,000-line image, you might have one strip of 10,000 rows or 100 strips of 100 rows each (or any other two numbers that produce 10,000 when multiplied).  There is a very small amount of overhead associated with each chunk, to store a pointer to its location in the file (in the TIFF header "index").

PackBits is simply run-length compression.  All data will compress the same with PackBits regardless of context (that is, regardless of what data comes before it in the file).  One 10,000-line strip and 100 100-line strips will each compress just as well and produce the same output.  Therefore, these two files will end up being almost the same size.  The only difference is that the 100-strip file needs to have 100 pointers to strips in the header, whereas the 10,000-line file doesn't.  The 100-strip file will be a few hundred bytes bigger, but it's probably an insignificant amount.

LZW compression is different.  It's a dictionary-based compression that scans the data finding runs of bytes and encoding each run with a variable bit-length field.  The compression "learns" from the data because it builds the dictionary as it goes.  This has two implications for you.  First, the 100-strip file will not be as small as the one-strip file.  That's because the compression starts over with a blank slate on every strip, so the subsequent 99 strips don't get the advantage of using the dictionary built for the first strip.  The actual difference will depend on your data, but we have seen roughly a 10% - 20% size increase in going from 1 strip per file to 1 row per strip (where there are 5,000-10,000 pixels per row).

The second implication is that to access any pixel the ENTIRE strip up to that pixel has to be decoded.  In the worst case, access to the lower-right pixel of a large, one-strip file, requires the ENTIRE FILE to be read from disk and decoded (and then thrown away).  This is not a performance-enhancing technique.  If you are using LZW-compressed TIFF imagery, consider converting it to be stored with a small number of rows per strip.  This can greatly improve random access performance at a relatively modest disk space cost.

If you're using MrSID files, don't.  Wavelet-based compression is designed to save a lot of disk space, NOT produce speedy decoding.  It's somewhat analogous to the LZW case - you need to decode a much larger amount of data in order to produce the small bit you may want.

	- Ed

Ed McNierney
President and Chief Mapmaker
TopoZone.com / Maps a la carte, Inc.
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
ed at topozone.com
(978) 251-4242 

-----Original Message-----
From: Steve Lehr [mailto:lehrs at erau.edu]
Sent: Monday, July 21, 2003 2:17 PM
To: mapserver-users at lists.gis.umn.edu
Subject: [Mapserver-users] Double indexing?

List:

Is their such a thing as double indexing.  I have some very large Image
(TIFF/SID) files in a tileindex.  Can each of the large TIFFS have some sort
of index on them to allow faster rendering within the picked TIFF?

Any examples and/or utility examples to make the indexes would be greatly
appreciated.  For the winning tip there is beer involved!

Thanks for your help.

Steven Lehr
Visiting Professor
Embry-Riddle Aeronautical University (LB159)
600 S. Clyde Morris Blvd.
Daytona Beach, FL 32114-3900
386-226-7740

_______________________________________________
Mapserver-users mailing list
Mapserver-users at lists.gis.umn.edu
http://lists.gis.umn.edu/mailman/listinfo/mapserver-users