[gdal-dev] IO Overhead when reading small subsets from Global Files

Even Rouault even.rouault at spatialys.com
Mon Dec 8 02:26:12 PST 2014


Le lundi 08 décembre 2014 11:14:04, Julian Zeidler a écrit :
> Hi Even,
> 
> Thanks for the quick reply.
> 
> I should have mentioned, that I also tried converting it to a compressed
> tiled Tiff.
> There i can see the same kind of overhead. I extract a 1MB Subset from
> the File and depending on the Tiles 1 read between 9-12 MB via the Network.
> The Process only reads one 500x500 block from every single File

For GeoTIFF, I would expect the overhead to be the size of the "tags" that 
store the offset and size of each block, so for a 40000x20000 dataset with 
100x100 blocks :
(40000 / 100) * (20000 / 100) * 2 * 4 = 640 000 bytes
I can't explain how it could reach 9-12 MB 

And if you use GDAL >= 1.11 compiled with its internal libtiff, there's a trick 
that avoids reading the full tags, and should only read them by "pages" of 4K, 
resulting in a neglectable overhead.

> 
> Cheers Julian
> 
> Am 08.12.2014 11:10, schrieb Even Rouault:
> > Le lundi 08 décembre 2014 10:44:41, Julian Zeidler a écrit :
> >> Dear Gdal-mailinglist,
> >> 
> >> I am currently trying to optimize a Global Modell.
> >> The Modell reads small chunks (500x500) from lots (One for each day) of
> >> Global Datasets (40000x20000)
> >> These Dataset are compressed NetCDFs with a tilling activated (100x100).
> >> (See output oif gdalinfo attached)
> >> However when I measure the File-IO via NFS i get a Factor of ~10
> >> compared to the uncompressed Output image when testing with gdal. Inside
> >> teh Modell using the netCDF library diretyl i measure an even worst
> >> Factor of ~60 compared to compressed outputs). This is better than using
> >> untiled Inputs where the overhad was ~80x, but still a larger overhead
> >> than I expected.
> >> I tested it using gdal_translate in.nc out.tif -srcwin 6000 6000 500 500
> > 
> > Julian,
> > 
> > I'm not sure how chunck indexing works internally in netCDF, but there
> > might be an overhead when reading the "index" the first time. So perhaps
> > if you do your reads from the same GDAL dataset object, without closing
> > it between different requests, the overhead will decrease. If you were
> > already doing that, then I'm not sure what you can do, except converting
> > into another format, like GTiff.
> > 
> > Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list