[gdal-dev] GDALRasterBand::RasterIO c++ vs BandReadAsArray python performance (Gareth Jones)

Gareth James Jones [gjj12] gjj12 at aber.ac.uk
Thu May 12 02:22:45 PDT 2016


Hi Sean,

I initialise datatmp using CPLMalloc in the C++

   T* dataTmp = (T*)CPLMalloc(sizeof(T) *(this->dspRastXSize * this->dspRastYSize));

this is then freed using CPLFree once finished with. I do re-malloc it for each band we want to read in a raster so that does add some slowdown that could probably be avoided.

The python is initialised using ReadAsArray.

I have taken a look at rasterio, and was of a mind to use it, but we are wanting to keep any more dependencies to an absolute minimum, so the C++ is being implemented with a check to see if it's available, if not the program will use the current method.


>Hi Gareth,
>
>How are you initializing your dataTmp array? In my Rasterio project, I've
>found that numpy.empty() is the fastest array allocator and use it whenever
>possible. I also use the GDAL C API and Cython (in case you're interested:
>https://github.com/mapbox/rasterio/blob/c80b568903ef7b902ce6254a42c73af9ddcc8362/rasterio/_io.pyx#L58-L69)
>and find the performance to be as good as ReadAsArray.

On Wed, May 11, 2016 at 10:53 AM, Gareth James Jones [gjj12] <
gjj12 at aber.ac.uk> wrote:

>> I'm currently writing optimisations for a raster viewer program which uses
>> gdal as it's base. It's currently written purely in python, and has some
>> major speed issues which cause problems when we are reading many files at a
>> time. After making some optimisations in the python, and getting quite a
>> minimal speed increase, I proceeded to profile the program quite heavily
>> and found that our getImage method was our slowest call. I had already
>> performed some optimisations on this function so decided to write a
>> C-Extension so that we could get some speed increases through a lower level
>> language.
>>
>> This has worked for the most part, however there is still one issue, we
>> have found a speed increase of ~2s for some of our larger files in the bulk
>> of the code. But this is negated by the GDALRasterIO call, which is
>> actually about 3s slower than the python ReadAsArray.
>>
>> This doesn't make any sense to me as ReadAsArray is a wrapper around a C++
>> call to GDALRasterIO, and thus should be slower than having a call straight
>> to GDALRasterIO.
>>
>> I was hoping someone here might know of a way to read the rasters more
>> efficiently. I have tried to implement a method using ReadBlock rather than
>> RasterIO, but due to the replication that RasterIO does it didn't work at
>> all. (I'm currently trying to figure out a way to do that replication
>> without losing too much speed).
>>
>> The RasterIO call i'm using is
>>
>> band->RasterIO(GF_Read, this->ovleft, this->ovtop, this->ovxsize,
>>                this->ovysize, dataTmp, this->ovxsize, this->ovysize,
>>                band->GetRasterDataType(), 0, 0);
>>
>> The old python call was:
>>
>> dataTmp = band.ReadAsArray(ovleft, ovtop,
>>     ovxsize, ovysize,
>>     dspRastXSize, dspRastYSize)
>>
>>
>> Thanks in advance
>>
>> Gareth Jones
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>



>--
>Sean Gillies

Gareth Jones
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20160511/46be7326/attachment-0001.html>

------------------------------

Subject: Digest Footer

_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

------------------------------

End of gdal-dev Digest, Vol 144, Issue 40
*****************************************


More information about the gdal-dev mailing list