[Gdal-dev] Advice on Raster Formats

Ed McNierney ed at topozone.com
Tue Aug 30 09:14:14 EDT 2005


Bill -

Here are a few thoughts on your situation, in order of when they popped
into my head while reading your note!

How many DOQQs are you dealing with?  That helps us understand the scope
of the problem.

You seem to say that your MrSID images are "lossless" - are you certain
that that's the case?  Your example seems rather small (20:1
compression) for lossless MrSID.  If you really need to get the original
data back, triple-check this point.

You do not need to "read the entire GeoTIFF" to extract a portion of it.
That is one of the significant advantages of the GeoTIFF format.  Your
own numbers below are examples of reading the entire file, where GeoTIFF
seems to do well.

Hard Drives are Cheap.  I just bought Western Digital 250GB EIDE drives
for $125 each.  That's about 1,800 of your 24-bit GeoTIFFs for $125, or
about 7 cents each.  Consider how much time you're spending on this
problem.  I'm surprised you find your data is growing at "outrageous
rates" - the USGS DOQQ data library is not growing all that rapidly
(about 5% per year for the last few years).  If you're starting with a
subset of DOQQs and expanding the area, you might just look at what you
need to cover the entire set.

You might also consider using a commercial WMS service that provides
DOQQ imagery.  It won't be quite as fast as your own online system can
be, but it's likely to be a whole lot faster than your compressed
imagery and might be able to provide something that's almost as fast as
GeoTIFFs on your LAN at a lower price.

	- Ed

Ed McNierney
President and Chief Mapmaker
TopoZone.com / Maps a la carte, Inc.
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
ed at topozone.com
(978) 251-4242 

-----Original Message-----
From: gdal-dev-bounces at lists.maptools.org
[mailto:gdal-dev-bounces at lists.maptools.org] On Behalf Of Bill Binko
Sent: Tuesday, August 30, 2005 2:30 AM
To: GDAL Developer List
Subject: [Gdal-dev] Advice on Raster Formats

Hello everyone,

I need some advice on how to store and process raster images I'm working
with.  The images have generally come in as either ECW (or JP2000 -- my
choice) or MrSID format.  They are aerial images (DOQQs generally at
1meter accuracy).

I need to be able to serve these up efficiently using mapserver (through
GDAL, of course).  The output will be a 24 bit JPEG (with overlay data
unfortunately compressed as well) that will be sent over the wire to the
client.

My problem is that both the ECW and the MrSIDs take a very long time to
decompress, but GeoTIFF takes an ungodly amount of disk space.  Here are
some numbers I just grabbed at random:

I took the same data in GeoTIFF, ECW, and MrSID formats and decompressed
them (to MEM which I thought would be a good comparison to reading into
mapserver).

$ time gdal_translate -of MEM Q2918se.tiff /dev/null Input file size is
6337, 7082 0...10...20...30...40...50...60...70...80...90...100 - done.
0.72user 0.52system 0:02.79elapsed 44%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (0major+37327minor)pagefaults 0swaps

$ time gdal_translate -of MEM Q2918se.sid /dev/null Input file size is
6337, 7082 0...10...20...30...40...50...60...70...80...90...100 - done.
55.80user 7.78system 1:12.57elapsed 87%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (3major+631879minor)pagefaults 0swaps

$ time gdal_translate -of MEM Q2918se.ecw /dev/null Input file size is
6337, 7082 0...10...20...30...40...50...60...70...80...90...100 - done.
25.92user 1.43system 0:29.80elapsed 91%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (0major+38181minor)pagefaults 0swaps

As you can see, MrSID is more than twice as slow as ECW, and GeoTIFF
just flies.  The flip side (disk space) can be seen here in a directory
listing:

 19226192 Aug 30 02:05 Q2918se.ecw
  7601858 Aug 30 02:21 Q2918se.sid
134983605 Aug 30 02:00 Q2918se.tiff

What I'm considering is keeping the full-scale files in MrSID format,
and keeping overviews (pyramids) of lower resolution images in ECW.
Assuming I need four layers (dividing each side by 2 -- and the area by
4) or so for adequate performance, the image above would take:

7.6 (base) + 19/4MB (layer1) + 19/16MB (layer2)+ 19/64MB (layer3) +
19/256MB (layer4).

That adds up to 13.9MB, a far cry from the 130MB TIFF.

This also gives me the ability to keep a lossless (MrSID) file in case I
ever want to convert to GeoTIFFs in the future.

Does this make sense?  Are there better ways to get performance out of
MrSIDs?  Should some of the higher (and smaller) layers stay GeoTIFF?

I'm sure that someone will say "Hard Drives are Cheap", and I agree, but
it seems that this data is growing at outrageous rates (uncompressed),
and that the time spent reading the entire GeoTIFF becomes a bottleneck
anyway.

I'd really appreciate any advice,

Thanks
Bill

_______________________________________________
Gdal-dev mailing list
Gdal-dev at lists.maptools.org
http://lists.maptools.org/mailman/listinfo/gdal-dev




More information about the Gdal-dev mailing list