[Gdal-dev] large file support, gdal_merge with 8bit images (was: GDAL data model...)
ed at topozone.com
Mon Mar 3 10:38:33 EST 2003
I'm not sure I'm quite following what you need, but I wanted to offer a suggestion.
You're working with a large, tiled, sparse image data set via GDAL. A while back I wrote a GDAL driver that implemented an interface to Microsoft TerraServer's large, tiled, sparse image data set. Each "dataset" is an entire UTM zone at one-meter-per-pixel resolution, with overviews implemented for lower-res data.
I was planning to contribute this to the GDAL distribution once I got multi-threaded tile fetching complete, but then Microsoft added WMS server support to TerraServer and the GDAL driver became unnecessary. If you'd like a copy of the code, I'd be happy to send it along.
One of the problems I had was with caching. GDAL assumes a rectangular array of source data tiles and creates a matrix of pointers to keep track of what's in cache and what is not. In my case, the cache pointer table was several hundred megabytes(!) when I only loaded a few hundred kilobytes of data. I reimplemented the GDAL cache with a linked list mechanism. It is a better choice if you're loading only a small portion of a very large dataset.
President and Chief Mapmaker
TopoZone.com / Maps a la carte, Inc.
73 Princeton Street, Suite 305
North Chelmsford, MA 01863
ed at topozone.com
From: Hannu Koivisto [mailto:azure at iki.fi]
Sent: Monday, March 03, 2003 10:13 AM
To: gdal-dev at remotesensing.org
Subject: Re: [Gdal-dev] large file support, gdal_merge with 8bit images
(was: GDAL data model...)
Frank Warmerdam <warmerdam at pobox.com> writes:
> Is there such a thing as an uncompressed PNG file? I believe all
> image segments in PNG files are internally compressed using zlib.
Um, you are probably right; what I really meant was the result of
pnmtopng -compression 0. I thought to try that since it seemed
that gdal_translate would take weeks to convert the image if it was
compressed. However, once again engaging the brain helped. I
discovered that pnmtotiff with packbits compression and explicit
rowsperstrip option (without it gdal_translate couldn't process the
image) could be used, i.e. the resulting file was only a bit over
400MB and gdal_translate was able to process it. And it took only
about 8 hours or so on a 700MHz Athlon machine :) So no LFS needed
So now I finally have a tiled GeoTIFF image. No georeference
information in it yet, though; I still have to figure out how to
inject that to it. The size, ~159MB, obtained with 512x512 tile
size, is a slight disappointment. With 1200x1200 tile size I got
~156MB, probably because the natural tile size of the map data is a
multiple of that, but the difference is very small and 512x512 is
probably better otherwise. Still, PNGs take 135MB. While somewhat
disappointing in absolute terms, it is still not bad given that the
compression must try to handle the introduced black areas, worth of
I do wonder what happens if the map is even more sparse. I thought
about this for a moment and came to the conclusion that it is a
very likely situation; one might want/need to save space and keep
only the important areas. Furthermore, the highest resolution data
may be quite sparse but the lower resolution overviews may be less
sparse. This results to a situation that GDAL cannot handle, as
far as I can see. Let's take a simple, silly example: a map of a
rectangular area. I have an overview that covers the entire area
and high resolution data that covers everything except an area in
the middle of the map, like this (O = overview data, H = high
resolution data, . = no data, i.e. black area when a GeoTIFF image
is made out of the high resolution data):
Now, if I'm wandering around in the high resolution data and end up
to the middle of the map, all I see is blackness. What I would
like to see is not-so-high-resolution overview data, but GDAL does
not have any mechanism to tell me that I'm looking "nothing" so
that I could switch to the overview. I guess I could identify
black areas but that gives me a SIGTHTBABW signal.
Also, even if this problem didn't exist and compression could
handle even more sparse data, I realized that I can't require other
users in a similar situation to build GeoTIFFs of their files.
Especially if they want to remove data from one end of the map and
add more to another end while they are on a journey or something.
With individual files this is easier. No one has to have huge
amounts of temporary disk space, netpbm with LFS, etc.
So, I think I'm going to ditch this rely-on-GeoTIFF idea, sorry. I
obviously will support GeoTIFF as well but I cannot simplify things
with "a map is a single GDALDataset" assumption. I think I define
maps to consist of one or more possibly "null" tiles, each backed
up by a GDALDataset. If a map consists of only one tile, I keep
the GDALDataset open and rely on GDAL's caching. Otherwise I read
tiles to memory one at the time as needed (and correspondingly free
them when running out of memory), taking care of caching of
multi-tile images myself that way. Overviews are handled as
separate maps that just happen to cover the same area as another
map. I haven't yet figured out how overviews contained within a
single GDALDataset can be supported in this model, but I'll try to
find a way or modify the model so that it can be done.
Btw, it might be nice to mention in the documentation of
BLOCK[XY]SIZE options that the sizes must be multiples of 16. I
had to dig up TIFF spec to find out why GDAL complained about
invalid "TileWidth" and "TileLength" values.
> GDAL's raster caching is global. That is one memory pool is shared
> amoung all datasets. If you have two files open, read all of one and
> I presume this is the behaviour you would want?
Gdal-dev mailing list
Gdal-dev at remotesensing.org
More information about the Gdal-dev