[gdal-dev] Raster size and ReadAsArray()
Even Rouault
even.rouault at mines-paris.org
Wed Aug 3 14:19:13 EDT 2011
Le mercredi 03 août 2011 17:32:53, Antonio Valentino a écrit :
> Hi Jay,
>
> Il 03/08/2011 16:53, Jay L. ha scritto:
> > I have been working on this problem as well. Initially, the attempt was
> > to ReadAsArray small chunks. Unfortunately this is quite inefficient.
> > Someone more knowledgeable will know why, but I suspect it has to do
> > with either thrashing or the fact that full blocks are not being read in
> > (as is the case when a 5000x5000 pixel block is read in on a 12567,
> > 12764 GTiff).
>
> Yes, using chunks that are too small can cause inefficiency, and yes
> using blocks as that are aligned (exact size of multiple size) to I/O
> blocks is a good idea whenever it is possible.
Yes I strongly concurr with that. Reading 5000x5000 in a 12567x12764 raster is
likely to be inefficient if the raster is scanline oriented, that is to if the
say the dimension of a bock reported by gdalinfo or GetBlockSize() is 12567x Y
rows. In such as situation you should try to read chunks of Y (or a multiple
of Y) whole lines.
Another point to take into consideration is when you read a multiband dataset.
If the data in the dataset is pixel interleaved, then you should try to read
all the bands at a time with DatasetRasterIO() so that GDAL avoids re-reading
from disk the same blocks for each band. On the contrary, if the data is band
interleaved, reading band by band is OK (using DatasetRasterIO() too because
it will detect and adapt itself to the data organization to select the best
algorithm).
There are other possible caveats depending on the file format itself. For
example if you read a JPEG, PNG or GIF image, you must know that you cannot
read back lines without causing decompression to be restarted from the top
line. But such formats are rarely used for that big images. I somehow remember
that it is also the case for some formulations of HDF4 (
http://trac.osgeo.org/gdal/ticket/3386 ).
You can check if your way of reading is efficient or not by defining CPL_DEBUG=ON
and look at the warnings. If you see something about "Potential thrashing on
band XXX of YYY", it is a hint that you didn't employ the most efficient reading
scheme.
>
> I don't know very well internals of the python binding implementation.
> Looking at the release notes it seems that some important change in this
> are as been don in release 1.8.0
>
> http://trac.osgeo.org/gdal/wiki/Release/1.8.0-News#SWIGLanguageBindings
Yes there have been a few optimizations to save some useless temporary buffer
copies, and a few fixes as well. One of them allow to read more than 2GB for 64
bit builds of GDAL.
Regards,
Even
More information about the gdal-dev
mailing list