[gdal-dev] GDALRasterBand::RasterIO c++ vs BandReadAsArray python performance (Gareth Jones)
Gareth James Jones [gjj12]
gjj12 at aber.ac.uk
Thu May 12 02:22:45 PDT 2016
I initialise datatmp using CPLMalloc in the C++
T* dataTmp = (T*)CPLMalloc(sizeof(T) *(this->dspRastXSize * this->dspRastYSize));
this is then freed using CPLFree once finished with. I do re-malloc it for each band we want to read in a raster so that does add some slowdown that could probably be avoided.
The python is initialised using ReadAsArray.
I have taken a look at rasterio, and was of a mind to use it, but we are wanting to keep any more dependencies to an absolute minimum, so the C++ is being implemented with a check to see if it's available, if not the program will use the current method.
>How are you initializing your dataTmp array? In my Rasterio project, I've
>found that numpy.empty() is the fastest array allocator and use it whenever
>possible. I also use the GDAL C API and Cython (in case you're interested:
>and find the performance to be as good as ReadAsArray.
On Wed, May 11, 2016 at 10:53 AM, Gareth James Jones [gjj12] <
gjj12 at aber.ac.uk> wrote:
>> I'm currently writing optimisations for a raster viewer program which uses
>> gdal as it's base. It's currently written purely in python, and has some
>> major speed issues which cause problems when we are reading many files at a
>> time. After making some optimisations in the python, and getting quite a
>> minimal speed increase, I proceeded to profile the program quite heavily
>> and found that our getImage method was our slowest call. I had already
>> performed some optimisations on this function so decided to write a
>> C-Extension so that we could get some speed increases through a lower level
>> This has worked for the most part, however there is still one issue, we
>> have found a speed increase of ~2s for some of our larger files in the bulk
>> of the code. But this is negated by the GDALRasterIO call, which is
>> actually about 3s slower than the python ReadAsArray.
>> This doesn't make any sense to me as ReadAsArray is a wrapper around a C++
>> call to GDALRasterIO, and thus should be slower than having a call straight
>> to GDALRasterIO.
>> I was hoping someone here might know of a way to read the rasters more
>> efficiently. I have tried to implement a method using ReadBlock rather than
>> RasterIO, but due to the replication that RasterIO does it didn't work at
>> all. (I'm currently trying to figure out a way to do that replication
>> without losing too much speed).
>> The RasterIO call i'm using is
>> band->RasterIO(GF_Read, this->ovleft, this->ovtop, this->ovxsize,
>> this->ovysize, dataTmp, this->ovxsize, this->ovysize,
>> band->GetRasterDataType(), 0, 0);
>> The old python call was:
>> dataTmp = band.ReadAsArray(ovleft, ovtop,
>> ovxsize, ovysize,
>> dspRastXSize, dspRastYSize)
>> Thanks in advance
>> Gareth Jones
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
-------------- next part --------------
An HTML attachment was scrubbed...
Subject: Digest Footer
gdal-dev mailing list
gdal-dev at lists.osgeo.org
End of gdal-dev Digest, Vol 144, Issue 40
More information about the gdal-dev