[gdal-dev] IO Overhead when reading small subsets from Global Files

Mon Dec 8 03:21:18 PST 2014

Am 08.12.2014 11:26, schrieb Even Rouault:
> Le lundi 08 décembre 2014 11:14:04, Julian Zeidler a écrit :
>> Hi Even,
>>
>> Thanks for the quick reply.
>>
>> I should have mentioned, that I also tried converting it to a compressed
>> tiled Tiff.
>> There i can see the same kind of overhead. I extract a 1MB Subset from
>> the File and depending on the Tiles 1 read between 9-12 MB via the Network.
>> The Process only reads one 500x500 block from every single File
> For GeoTIFF, I would expect the overhead to be the size of the "tags" that
> store the offset and size of each block, so for a 40000x20000 dataset with
> 100x100 blocks :
> (40000 / 100) * (20000 / 100) * 2 * 4 = 640 000 bytes
> I can't explain how it could reach 9-12 MB
>
> And if you use GDAL >= 1.11 compiled with its internal libtiff, there's a trick
> that avoids reading the full tags, and should only read them by "pages" of 4K,
> resulting in a neglectable overhead.
I just rechecked the versions.
it is

Gdal: 1.11.0 with internal libtiff and

Netcdf: 4.2

I was quite surprised by the overhead myself.  I ran some quick test 
with different blocksizes and output windows.  The overhead decreased 
with size (1000x1000, 2000x2000) but was still significant at 5x output 
size.
I guess we will have to live with the Overhead.

Thanks
Julian

>> Cheers Julian
>>
>> Am 08.12.2014 11:10, schrieb Even Rouault:
>>> Le lundi 08 décembre 2014 10:44:41, Julian Zeidler a écrit :
>>>> Dear Gdal-mailinglist,
>>>>
>>>> I am currently trying to optimize a Global Modell.
>>>> The Modell reads small chunks (500x500) from lots (One for each day) of
>>>> Global Datasets (40000x20000)
>>>> These Dataset are compressed NetCDFs with a tilling activated (100x100).
>>>> (See output oif gdalinfo attached)
>>>> However when I measure the File-IO via NFS i get a Factor of ~10
>>>> compared to the uncompressed Output image when testing with gdal. Inside
>>>> teh Modell using the netCDF library diretyl i measure an even worst
>>>> Factor of ~60 compared to compressed outputs). This is better than using
>>>> untiled Inputs where the overhad was ~80x, but still a larger overhead
>>>> than I expected.
>>>> I tested it using gdal_translate in.nc out.tif -srcwin 6000 6000 500 500
>>> Julian,
>>>
>>> I'm not sure how chunck indexing works internally in netCDF, but there
>>> might be an overhead when reading the "index" the first time. So perhaps
>>> if you do your reads from the same GDAL dataset object, without closing
>>> it between different requests, the overhead will decrease. If you were
>>> already doing that, then I'm not sure what you can do, except converting
>>> into another format, like GTiff.
>>>
>>> Even

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141208/7094a637/attachment-0001.html>