[gdal-dev] tiff format: compression vs. performance
Ed McNierney
ed at topozone.com
Tue Jan 15 12:06:55 EST 2008
Andreas -
I would, of course, advise against generalizations, but for aerial photos using tiled TIFFs with JPEG compression (in each tile) is worth investigating. It can provide a good balance between compression and performance, since you get the benefits of JPEG compression but the tiling reduces the performance impact of extracting a small portion of the image.
- Ed
-----Original Message-----
From: Andreas Neumann [mailto:a.neumann at carto.net]
Sent: Tuesday, January 15, 2008 11:58 AM
To: Ed McNierney
Subject: RE: [gdal-dev] tiff format: compression vs. performance
Thank you Ed and others for sharing your experiences and the quite
comprehensive discussion of the subject.
In this particular case I am dealing with b/w maps, with bigger spots of
homogeneous white areas. I converted it to grayscale to get better quality
with using gdals' average resampling method.
I tested the filesizes uncompressed vs. packbits. Packbits is factor 6
smaller. Obviously, lzw and deflates are much smaller again, but if they
take longer/considerably more cpu to decompress, its probably not a good
option.
I will probably go with packbits.
What do you recommend for aerial images? Jpeg, or something else than tif?
ECW/MrSID is probably good, but dangerous, because of licenses. ECW now
owned by Leica, and Leica not exactly known as a pro OS company.
Andreas
> Andreas -
>
> This is a complex issue, with many different factors involved. I think it
> is more important to understand the principles involved rather than look
> at specific performance numbers for a given data set.
>
> Compression essentially trades off disk read time for compute time. For a
> compressed data set you spend less time reading from the disk (in theory)
> and more time decompressing the data. If you don't manage to spend less
> time reading from the disk, then you're certainly going to have poorer
> performance than uncompressed data!
>
> Generally speaking, of course, your disk is slow and your CPU is fast, but
> there's a lot of variability in both of those measurements. Some disks
> are pretty fast, and some CPUs, while fast, may be busy doing a lot of
> other things. Your disk will also have a certain read resolution or
> granularity; reading 100 bytes from a disk is no faster than reading 200
> bytes, because your disk subsystem will read at least a whole sector when
> requested to do either.
>
> You also need to look at the properties of your data, especially if you're
> limiting yourself to uncompressed, PackBits, and LZW-compressed TIFFs.
> Some data sets (aerial photography) are usually quite incompressible using
> those algorithms. They're not going to get any smaller if you try to
> compress them, so there's no point in trying. Other data sets will
> perform much better. You should test your data to first see which
> compression methods provide useful compression in your situation.
>
> LZW compression can be particularly tricky, since it builds a "dictionary"
> on the fly from the source data. Normally this means that if you want to
> read only the lower-right pixel in a TIFF file you will need to read and
> decode the ENTIRE file, in order to reconstruct the dictionary to read
> that last pixel. This does not have to be the case, however. You can
> specify a STRIP size in your TIFF file; this will treat the file as a
> series of strips, each N rows high. Reading a random location in the file
> will only require decoding the strip that contains that data. You will
> not get quite as good compression, but you can get much faster LZW
> performance.
>
> Think carefully about what's involved in reading your data from the disk.
> If you're reading a small, vertical strip from the file your disk will
> need to read a few bytes from one line, then (possibly) seek to the next
> line, read a few bytes, etc. If your file is particularly wide this can
> become a problem, since a disk SEEK is the slowest thing you can possibly
> do. This is where TILED TIFFs will help you, as they will organize your
> data in tiles rather than strips. Think about what tiling is doing for
> you - it is reducing the need to seek across the width of the file to move
> vertically in the image. Your tile size needs to be substantially smaller
> than the width of your TIFF in order to produce a noticeable improvement.
> Again, the penalty you pay for seeking through the file depends very
> heavily on your particular disk subsystem, and you may or may not see much
> improvement from tiling.
>
> I will give one example that showed quite different results from the one
> Christopher posted, simply to illustrate the variability. This was a very
> old experiment and I don't have specific measurements available. I was
> trying to compress scanned USGS topographic maps (DRGs), which are
> essentially line art scanned as TIFFs. These images actually DO compress
> very well with Packbits or LZW, because they aren't photos and contain
> long stretches of identical pixels or repeated patterns.
>
> There was no performance problem with the uncompressed TIFFs, but I needed
> more disk space (this was a long time ago <g>). I found that PackBits
> compression reduced the TIFF files to one-third their original size, with
> no perceptible degradation in performance. PackBits is, of course, a very
> simple algorithm so the reduced disk reading was a benefit and the
> decompression was trivial. I then tried compression with LZW and found
> that these files were one-third smaller than PackBits (one-ninth the
> original size) but performance was distinctly slower. That's because I
> needed to decompress the entire file up to a certain point in order to
> read that pixel.
>
> But I then used LZW with a strip size of 1 row. That meant that my files
> (roughly 6,000 x 7,000 pixels) were compressed as 7,000 independent
> 6,000-pixel data sets. I could read any row by using the TIFF directory
> to seek right to it, then decompress the data. I was surprised to find
> that this compression was nearly as fast as PackBits, but the files were
> only about 10% larger than the original LZW compression. If LZW appears
> to be a suitable compression format for you, it is very worthwhile to
> experiment with different strip sizes to compare the tradeoff between
> performance and compression.
>
> But.... now that I have a lot more disk space, I store all my TIFFs
> uncompressed. That might change in the future (there have been other
> business reasons for storing the bulk of my data uncompressed). Given
> your constraint on lossless compression, the very first thing you should
> do is see if PackBits or LZW actually compress your data, because if they
> don't your research is complete!
>
> - Ed
>
> Ed McNierney
> Chief Mapmaker
> Demand Media / TopoZone.com
> 73 Princeton Street, Suite 305
> North Chelmsford, MA 01863
> ed at topozone.com
> Phone: +1 (978) 251-4242
> Fax: +1 (978) 251-1396
>
> -----Original Message-----
> From: gdal-dev-bounces at lists.osgeo.org
> [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Andreas Neumann
> Sent: Tuesday, January 15, 2008 8:27 AM
> To: gdal-dev at lists.osgeo.org
> Subject: [gdal-dev] tiff format: compression vs. performance
>
> Hello,
>
> I am wondering whether there is a webpage with some tips/hints on what is
> the best compression method within the tiff file format when performance
> is also an issue. My data is primarily read only, but reading should be
> fast. I heard that storing the tiff files as uncompressed is often still
> the fastest. Is this still the case? Are there recommended compression
> methods that are both fast (and allow random access to a big tiff file)
> and also creates smaller file sizes? I am looking at lossless compression
> methods, so jpeg is not an option.
>
> Thanks for hints (or links to a webpage discussing this issue).
>
> Andreas
>
>
> --
> Andreas Neumann
> Böschacherstrasse 6, CH-8624 Grüt/Gossau, Switzerland
> Email: a.neumann at carto.net, Web:
> * http://www.carto.net/ (Carto and SVG resources)
> * http://www.carto.net/neumann/ (personal page)
> * http://www.svgopen.org/ (SVG Open Conference)
> * http://www.geofoto.ch/ (Georeferenced Photos of Switzerland)
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
--
Andreas Neumann
Böschacherstrasse 6, CH-8624 Grüt/Gossau, Switzerland
Email: a.neumann at carto.net, Web:
* http://www.carto.net/ (Carto and SVG resources)
* http://www.carto.net/neumann/ (personal page)
* http://www.svgopen.org/ (SVG Open Conference)
* http://www.geofoto.ch/ (Georeferenced Photos of Switzerland)
More information about the gdal-dev
mailing list