[gdal-dev] tiff format: compression vs. performance
Ed McNierney
ed at topozone.com
Tue Jan 15 10:03:57 EST 2008
Andreas -
This is a complex issue, with many different factors involved. I think it is more important to understand the principles involved rather than look at specific performance numbers for a given data set.
Compression essentially trades off disk read time for compute time. For a compressed data set you spend less time reading from the disk (in theory) and more time decompressing the data. If you don't manage to spend less time reading from the disk, then you're certainly going to have poorer performance than uncompressed data!
Generally speaking, of course, your disk is slow and your CPU is fast, but there's a lot of variability in both of those measurements. Some disks are pretty fast, and some CPUs, while fast, may be busy doing a lot of other things. Your disk will also have a certain read resolution or granularity; reading 100 bytes from a disk is no faster than reading 200 bytes, because your disk subsystem will read at least a whole sector when requested to do either.
You also need to look at the properties of your data, especially if you're limiting yourself to uncompressed, PackBits, and LZW-compressed TIFFs. Some data sets (aerial photography) are usually quite incompressible using those algorithms. They're not going to get any smaller if you try to compress them, so there's no point in trying. Other data sets will perform much better. You should test your data to first see which compression methods provide useful compression in your situation.
LZW compression can be particularly tricky, since it builds a "dictionary" on the fly from the source data. Normally this means that if you want to read only the lower-right pixel in a TIFF file you will need to read and decode the ENTIRE file, in order to reconstruct the dictionary to read that last pixel. This does not have to be the case, however. You can specify a STRIP size in your TIFF file; this will treat the file as a series of strips, each N rows high. Reading a random location in the file will only require decoding the strip that contains that data. You will not get quite as good compression, but you can get much faster LZW performance.
Think carefully about what's involved in reading your data from the disk. If you're reading a small, vertical strip from the file your disk will need to read a few bytes from one line, then (possibly) seek to the next line, read a few bytes, etc. If your file is particularly wide this can become a problem, since a disk SEEK is the slowest thing you can possibly do. This is where TILED TIFFs will help you, as they will organize your data in tiles rather than strips. Think about what tiling is doing for you - it is reducing the need to seek across the width of the file to move vertically in the image. Your tile size needs to be substantially smaller than the width of your TIFF in order to produce a noticeable improvement. Again, the penalty you pay for seeking through the file depends very heavily on your particular disk subsystem, and you may or may not see much improvement from tiling.
I will give one example that showed quite different results from the one Christopher posted, simply to illustrate the variability. This was a very old experiment and I don't have specific measurements available. I was trying to compress scanned USGS topographic maps (DRGs), which are essentially line art scanned as TIFFs. These images actually DO compress very well with Packbits or LZW, because they aren't photos and contain long stretches of identical pixels or repeated patterns.
There was no performance problem with the uncompressed TIFFs, but I needed more disk space (this was a long time ago <g>). I found that PackBits compression reduced the TIFF files to one-third their original size, with no perceptible degradation in performance. PackBits is, of course, a very simple algorithm so the reduced disk reading was a benefit and the decompression was trivial. I then tried compression with LZW and found that these files were one-third smaller than PackBits (one-ninth the original size) but performance was distinctly slower. That's because I needed to decompress the entire file up to a certain point in order to read that pixel.
But I then used LZW with a strip size of 1 row. That meant that my files (roughly 6,000 x 7,000 pixels) were compressed as 7,000 independent 6,000-pixel data sets. I could read any row by using the TIFF directory to seek right to it, then decompress the data. I was surprised to find that this compression was nearly as fast as PackBits, but the files were only about 10% larger than the original LZW compression. If LZW appears to be a suitable compression format for you, it is very worthwhile to experiment with different strip sizes to compare the tradeoff between performance and compression.
But.... now that I have a lot more disk space, I store all my TIFFs uncompressed. That might change in the future (there have been other business reasons for storing the bulk of my data uncompressed). Given your constraint on lossless compression, the very first thing you should do is see if PackBits or LZW actually compress your data, because if they don't your research is complete!
- Ed
Ed McNierney
Chief Mapmaker
Demand Media / TopoZone.com
73 Princeton Street, Suite 305
North Chelmsford, MA 01863
ed at topozone.com
Phone: +1 (978) 251-4242
Fax: +1 (978) 251-1396
-----Original Message-----
From: gdal-dev-bounces at lists.osgeo.org [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Andreas Neumann
Sent: Tuesday, January 15, 2008 8:27 AM
To: gdal-dev at lists.osgeo.org
Subject: [gdal-dev] tiff format: compression vs. performance
Hello,
I am wondering whether there is a webpage with some tips/hints on what is
the best compression method within the tiff file format when performance
is also an issue. My data is primarily read only, but reading should be
fast. I heard that storing the tiff files as uncompressed is often still
the fastest. Is this still the case? Are there recommended compression
methods that are both fast (and allow random access to a big tiff file)
and also creates smaller file sizes? I am looking at lossless compression
methods, so jpeg is not an option.
Thanks for hints (or links to a webpage discussing this issue).
Andreas
--
Andreas Neumann
Böschacherstrasse 6, CH-8624 Grüt/Gossau, Switzerland
Email: a.neumann at carto.net, Web:
* http://www.carto.net/ (Carto and SVG resources)
* http://www.carto.net/neumann/ (personal page)
* http://www.svgopen.org/ (SVG Open Conference)
* http://www.geofoto.ch/ (Georeferenced Photos of Switzerland)
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev
More information about the gdal-dev
mailing list