[gdal-dev] tiff format: compression vs. performance

Ed McNierney ed at topozone.com
Tue Jan 15 12:06:55 EST 2008


Andreas -

I would, of course, advise against generalizations, but for aerial photos using tiled TIFFs with JPEG compression (in each tile) is worth investigating.  It can provide a good balance between compression and performance, since you get the benefits of JPEG compression but the tiling reduces the performance impact of extracting a small portion of the image.

     - Ed

-----Original Message-----
From: Andreas Neumann [mailto:a.neumann at carto.net] 
Sent: Tuesday, January 15, 2008 11:58 AM
To: Ed McNierney
Subject: RE: [gdal-dev] tiff format: compression vs. performance

Thank you Ed and others for sharing your experiences and the quite
comprehensive discussion of the subject.

In this particular case I am dealing with b/w maps, with bigger spots of
homogeneous white areas. I converted it to grayscale to get better quality
with using gdals' average resampling method.

I tested the filesizes uncompressed vs. packbits. Packbits is factor 6
smaller. Obviously, lzw and deflates are much smaller again, but if they
take longer/considerably more cpu to decompress, its probably not a good
option.

I will probably go with packbits.

What do you recommend for aerial images? Jpeg, or something else than tif?
ECW/MrSID is probably good, but dangerous, because of licenses. ECW now
owned by Leica, and Leica not exactly known as a pro OS company.

Andreas


> Andreas -
>
> This is a complex issue, with many different factors involved.  I think it
> is more important to understand the principles involved rather than look
> at specific performance numbers for a given data set.
>
> Compression essentially trades off disk read time for compute time.  For a
> compressed data set you spend less time reading from the disk (in theory)
> and more time decompressing the data.  If you don't manage to spend less
> time reading from the disk, then you're certainly going to have poorer
> performance than uncompressed data!
>
> Generally speaking, of course, your disk is slow and your CPU is fast, but
> there's a lot of variability in both of those measurements.  Some disks
> are pretty fast, and some CPUs, while fast, may be busy doing a lot of
> other things.  Your disk will also have a certain read resolution or
> granularity; reading 100 bytes from a disk is no faster than reading 200
> bytes, because your disk subsystem will read at least a whole sector when
> requested to do either.
>
> You also need to look at the properties of your data, especially if you're
> limiting yourself to uncompressed, PackBits, and LZW-compressed TIFFs.
> Some data sets (aerial photography) are usually quite incompressible using
> those algorithms.  They're not going to get any smaller if you try to
> compress them, so there's no point in trying.  Other data sets will
> perform much better.  You should test your data to first see which
> compression methods provide useful compression in your situation.
>
> LZW compression can be particularly tricky, since it builds a "dictionary"
> on the fly from the source data.  Normally this means that if you want to
> read only the lower-right pixel in a TIFF file you will need to read and
> decode the ENTIRE file, in order to reconstruct the dictionary to read
> that last pixel.  This does not have to be the case, however.  You can
> specify a STRIP size in your TIFF file; this will treat the file as a
> series of strips, each N rows high.  Reading a random location in the file
> will only require decoding the strip that contains that data.  You will
> not get quite as good compression, but you can get much faster LZW
> performance.
>
> Think carefully about what's involved in reading your data from the disk.
> If you're reading a small, vertical strip from the file your disk will
> need to read a few bytes from one line, then (possibly) seek to the next
> line, read a few bytes, etc.  If your file is particularly wide this can
> become a problem, since a disk SEEK is the slowest thing you can possibly
> do.  This is where TILED TIFFs will help you, as they will organize your
> data in tiles rather than strips.  Think about what tiling is doing for
> you - it is reducing the need to seek across the width of the file to move
> vertically in the image.  Your tile size needs to be substantially smaller
> than the width of your TIFF in order to produce a noticeable improvement.
> Again, the penalty you pay for seeking through the file depends very
> heavily on your particular disk subsystem, and you may or may not see much
> improvement from tiling.
>
> I will give one example that showed quite different results from the one
> Christopher posted, simply to illustrate the variability.  This was a very
> old experiment and I don't have specific measurements available.  I was
> trying to compress scanned USGS topographic maps (DRGs), which are
> essentially line art scanned as TIFFs.  These images actually DO compress
> very well with Packbits or LZW, because they aren't photos and contain
> long stretches of identical pixels or repeated patterns.
>
> There was no performance problem with the uncompressed TIFFs, but I needed
> more disk space (this was a long time ago <g>).  I found that PackBits
> compression reduced the TIFF files to one-third their original size, with
> no perceptible degradation in performance.  PackBits is, of course, a very
> simple algorithm so the reduced disk reading was a benefit and the
> decompression was trivial.  I then tried compression with LZW and found
> that these files were one-third smaller than PackBits (one-ninth the
> original size) but performance was distinctly slower.  That's because I
> needed to decompress the entire file up to a certain point in order to
> read that pixel.
>
> But I then used LZW with a strip size of 1 row.  That meant that my files
> (roughly 6,000 x 7,000 pixels) were compressed as 7,000 independent
> 6,000-pixel data sets.  I could read any row by using the TIFF directory
> to seek right to it, then decompress the data.  I was surprised to find
> that this compression was nearly as fast as PackBits, but the files were
> only about 10% larger than the original LZW compression.  If LZW appears
> to be a suitable compression format for you, it is very worthwhile to
> experiment with different strip sizes to compare the tradeoff between
> performance and compression.
>
> But.... now that I have a lot more disk space, I store all my TIFFs
> uncompressed.  That might change in the future (there have been other
> business reasons for storing the bulk of my data uncompressed).  Given
> your constraint on lossless compression, the very first thing you should
> do is see if PackBits or LZW actually compress your data, because if they
> don't your research is complete!
>
>      - Ed
>
> Ed McNierney
> Chief Mapmaker
> Demand Media / TopoZone.com
> 73 Princeton Street, Suite 305
> North Chelmsford, MA  01863
> ed at topozone.com
> Phone: +1 (978) 251-4242
> Fax: +1 (978) 251-1396
>
> -----Original Message-----
> From: gdal-dev-bounces at lists.osgeo.org
> [mailto:gdal-dev-bounces at lists.osgeo.org] On Behalf Of Andreas Neumann
> Sent: Tuesday, January 15, 2008 8:27 AM
> To: gdal-dev at lists.osgeo.org
> Subject: [gdal-dev] tiff format: compression vs. performance
>
> Hello,
>
> I am wondering whether there is a webpage with some tips/hints on what is
> the best compression method within the tiff file format when performance
> is also an issue. My data is primarily read only, but reading should be
> fast. I heard that storing the tiff files as uncompressed is often still
> the fastest. Is this still the case? Are there recommended compression
> methods that are both fast (and allow random access to a big tiff file)
> and also creates smaller file sizes? I am looking at lossless compression
> methods, so jpeg is not an option.
>
> Thanks for hints (or links to a webpage discussing this issue).
>
> Andreas
>
>
> --
> Andreas Neumann
> Böschacherstrasse 6, CH-8624 Grüt/Gossau, Switzerland
> Email: a.neumann at carto.net, Web:
> * http://www.carto.net/ (Carto and SVG resources)
> * http://www.carto.net/neumann/ (personal page)
> * http://www.svgopen.org/ (SVG Open Conference)
> * http://www.geofoto.ch/ (Georeferenced Photos of Switzerland)
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>


-- 
Andreas Neumann
Böschacherstrasse 6, CH-8624 Grüt/Gossau, Switzerland
Email: a.neumann at carto.net, Web:
* http://www.carto.net/ (Carto and SVG resources)
* http://www.carto.net/neumann/ (personal page)
* http://www.svgopen.org/ (SVG Open Conference)
* http://www.geofoto.ch/ (Georeferenced Photos of Switzerland)



More information about the gdal-dev mailing list