[gdal-dev] Raster size and ReadAsArray()

Even Rouault even.rouault at mines-paris.org
Wed Aug 3 14:19:13 EDT 2011


Le mercredi 03 août 2011 17:32:53, Antonio Valentino a écrit :
> Hi Jay,
> 
> Il 03/08/2011 16:53, Jay L. ha scritto:
> > I have been working on this problem as well.  Initially, the attempt was
> > to ReadAsArray small chunks.  Unfortunately this is quite inefficient. 
> > Someone more knowledgeable will know why, but I suspect it has to do
> > with either thrashing or the fact that full blocks are not being read in
> > (as is the case when a 5000x5000 pixel block is read in on a 12567,
> > 12764 GTiff).
> 
> Yes, using chunks that are too small can cause inefficiency, and yes
> using blocks as that are aligned (exact size of multiple size) to I/O
> blocks is a good idea whenever it is possible.

Yes I strongly concurr with that. Reading 5000x5000 in a 12567x12764 raster is 
likely to be inefficient if the raster is scanline oriented, that is to if the 
say the dimension of a bock reported by gdalinfo or GetBlockSize() is 12567x Y 
rows. In such as situation you should try to read chunks of Y (or a multiple 
of Y) whole lines.

Another point to take into consideration is when you read a multiband dataset. 
If the data in the dataset is pixel interleaved, then you should try to read 
all the bands at a time with DatasetRasterIO() so that GDAL avoids re-reading 
from disk the same blocks for each band. On the contrary, if the data is band 
interleaved, reading band by band is OK (using DatasetRasterIO() too because 
it will detect and adapt itself to the data organization to select the best 
algorithm).

There are other possible caveats depending on the file format itself. For 
example if you read a JPEG, PNG or GIF image, you must know that you cannot 
read back lines without causing decompression to be restarted from the top 
line. But such formats are rarely used for that big images. I somehow remember 
that it is also the case for some formulations of HDF4 ( 
http://trac.osgeo.org/gdal/ticket/3386 ).

You can check if your way of reading is efficient or not by defining CPL_DEBUG=ON 
and look at the warnings. If you see something about "Potential thrashing on 
band XXX of YYY", it is a hint that you didn't employ the most efficient reading 
scheme.

> 
> I don't know very well internals of the python binding implementation.
> Looking at the release notes it seems that some important change in this
> are as been don in release 1.8.0
> 
> http://trac.osgeo.org/gdal/wiki/Release/1.8.0-News#SWIGLanguageBindings

Yes there have been a few optimizations to save some useless temporary buffer 
copies, and a few fixes as well. One of them allow to read more than 2GB for 64 
bit builds of GDAL.

Regards,

Even


More information about the gdal-dev mailing list