[gdal-dev] Raster size and ReadAsArray()

Jay L. jzl5325 at psu.edu
Wed Aug 3 14:28:35 EDT 2011


To ensure that i understand here is an example:

If I have a GTiff, where the block size is one row by all of the columns  (a
single scanline), I should try to read in either one scanline at a time, or
multiple entire scanlines.  It is inefficient to take say 10 rows and only
half of the columns.

What if my application requires that I read one entire column by an
arbitrary number of scanlines?  Essentially reading at a 90 degree angle to
the block size.  Other an increasing the cache size and flushing the cache,
are their other techniques to reduce thrashing (and therefore processing


On Wed, Aug 3, 2011 at 11:19 AM, Even Rouault
<even.rouault at mines-paris.org>wrote:

> Le mercredi 03 août 2011 17:32:53, Antonio Valentino a écrit :
> > Hi Jay,
> >
> > Il 03/08/2011 16:53, Jay L. ha scritto:
> > > I have been working on this problem as well.  Initially, the attempt
> was
> > > to ReadAsArray small chunks.  Unfortunately this is quite inefficient.
> > > Someone more knowledgeable will know why, but I suspect it has to do
> > > with either thrashing or the fact that full blocks are not being read
> in
> > > (as is the case when a 5000x5000 pixel block is read in on a 12567,
> > > 12764 GTiff).
> >
> > Yes, using chunks that are too small can cause inefficiency, and yes
> > using blocks as that are aligned (exact size of multiple size) to I/O
> > blocks is a good idea whenever it is possible.
> Yes I strongly concurr with that. Reading 5000x5000 in a 12567x12764 raster
> is
> likely to be inefficient if the raster is scanline oriented, that is to if
> the
> say the dimension of a bock reported by gdalinfo or GetBlockSize() is
> 12567x Y
> rows. In such as situation you should try to read chunks of Y (or a
> multiple
> of Y) whole lines.
> Another point to take into consideration is when you read a multiband
> dataset.
> If the data in the dataset is pixel interleaved, then you should try to
> read
> all the bands at a time with DatasetRasterIO() so that GDAL avoids
> re-reading
> from disk the same blocks for each band. On the contrary, if the data is
> band
> interleaved, reading band by band is OK (using DatasetRasterIO() too
> because
> it will detect and adapt itself to the data organization to select the best
> algorithm).
> There are other possible caveats depending on the file format itself. For
> example if you read a JPEG, PNG or GIF image, you must know that you cannot
> read back lines without causing decompression to be restarted from the top
> line. But such formats are rarely used for that big images. I somehow
> remember
> that it is also the case for some formulations of HDF4 (
> http://trac.osgeo.org/gdal/ticket/3386 ).
> You can check if your way of reading is efficient or not by defining
> and look at the warnings. If you see something about "Potential thrashing
> on
> band XXX of YYY", it is a hint that you didn't employ the most efficient
> reading
> scheme.
> >
> > I don't know very well internals of the python binding implementation.
> > Looking at the release notes it seems that some important change in this
> > are as been don in release 1.8.0
> >
> > http://trac.osgeo.org/gdal/wiki/Release/1.8.0-News#SWIGLanguageBindings
> Yes there have been a few optimizations to save some useless temporary
> buffer
> copies, and a few fixes as well. One of them allow to read more than 2GB
> for 64
> bit builds of GDAL.
> Regards,
> Even
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/gdal-dev/attachments/20110803/90973e2f/attachment-0001.html

More information about the gdal-dev mailing list