[gdal-dev] Multi-threading GDALRasterBand::RasterIO() ?

Simon Eves simon.eves at omnisci.com
Mon Nov 1 14:45:50 PDT 2021


More specifically, I'm guessing the MOST efficient is to read whole blocks
from one band, which presumably avoids crossing compression boundaries etc.

A block may be a scanline, or it may be a tile, depending on the format.


On Mon, Nov 1, 2021 at 2:29 PM Simon Eves <simon.eves at omnisci.com> wrote:

> Band by band makes sense. I shall do that instead. Thank you! :)
>
> On Mon, Nov 1, 2021 at 2:05 PM Even Rouault <even.rouault at spatialys.com>
> wrote:
>
>> Yes regarding multithreading. Regarding GRIB and performance issues, you
>> must be aware that the GRIB driver when accessing a single pixel of a band
>> needs to decompress data for the whole band. Hence there's a per-dataset
>> cache of band data which default to 100 MB (you can increase it by setting
>> the GRIB_CACHEMAX config option to a number in megabytes). So the most
>> performance access pattern for GRIB is to read band per band, and no
>> all-bands-of-a-line
>> Le 01/11/2021 à 21:58, Simon Eves a écrit :
>>
>> You can ignore this.
>>
>> I have rather belatedly found the documentation that says that one must
>> open a GDALDataset per thread, even if it's on the same file.
>>
>> The multi-threading now works just fine.
>>
>> Interestingly, we're not actually doing that with our existing geo
>> importer. I guess it's OK because we're pulling the OGRFeatures out with
>> the process thread, and only converting and loading them with the child
>> threads. I guess I really ought to rewrite that code too now. Sigh.
>>
>> As you were...
>>
>> Simon
>>
>> On Sun, Oct 31, 2021 at 4:27 PM Simon Eves <simon.eves at omnisci.com>
>> wrote:
>>
>>> We are writing a raster importer, and finding that
>>> GDALRasterBand::RasterIO() is unexpectedly slow for some GRIB2 files.
>>>
>>> We have a file which is about 1800x1000 pixels, with 49 bands of type
>>> DOUBLE. The file is about 47MB on disc.
>>>
>>> Reading all the bands of a single scanline from this file takes about
>>> 1300ms, which is about 26ms per band, hence the entire file takes around 20
>>> minutes to import. All the time seems to be spent in the RasterIO() call,
>>> even though it's not doing any raster rescaling or data format conversion
>>> (1:1 pixels, fetching as GDT_Float64).
>>>
>>> So, I figured we'd try multi-threading it, but evidently the call is not
>>> thread-safe. Here is just one of various stack traces it will throw.
>>>
>>> libc.so.6!raise (Unknown Source:0)
>>> libc.so.6!abort (Unknown Source:0)
>>> libc.so.6![Unknown/Just-In-Time compiled code] (Unknown Source:0)
>>> libgdal.so.28!GRIBRasterBand::UncacheData(GRIBRasterBand * const this)
>>> (/build/scripts/gdal-3.2.2/frmts/grib/gribdataset.cpp:948)
>>> libgdal.so.28!GRIBRasterBand::LoadData(GRIBRasterBand * const this)
>>> (/build/scripts/gdal-3.2.2/frmts/grib/gribdataset.cpp:730)
>>> libgdal.so.28!GRIBRasterBand::LoadData(GRIBRasterBand * const this)
>>> (/build/scripts/gdal-3.2.2/frmts/grib/gribdataset.cpp:697)
>>> libgdal.so.28!GRIBRasterBand::IReadBlock(GRIBRasterBand * const this,
>>> int nBlockYOff, void * pImage)
>>> (/build/scripts/gdal-3.2.2/frmts/grib/gribdataset.cpp:803)
>>> libgdal.so.28!GDALRasterBand::GetLockedBlockRef(int bJustInitialize, int
>>> nYBlockOff, int nXBlockOff, GDALRasterBand * const this)
>>> (/build/scripts/gdal-3.2.2/gcore/gdal_priv.h:963)
>>> libgdal.so.28!GDALRasterBand::GetLockedBlockRef(GDALRasterBand * const
>>> this, int nXBlockOff, int nYBlockOff, int bJustInitialize)
>>> (/build/scripts/gdal-3.2.2/gcore/gdalrasterband.cpp:1238)
>>> libgdal.so.28!GDALRasterBand::IRasterIO(GDALRasterBand * const this,
>>> GDALRWFlag eRWFlag, int nXOff, int nYOff, int nXSize, int nYSize, void *
>>> pData, int nBufXSize, int nBufYSize, GDALDataType eBufType, GSpacing
>>> nPixelSpace, GSpacing nLineSpace, GDALRasterIOExtraArg * psExtraArg)
>>> (/build/scripts/gdal-3.2.2/gcore/rasterio.cpp:149)
>>> libgdal.so.28!GDALRasterBand::RasterIO(GDALRasterBand * const this,
>>> GDALRWFlag eRWFlag, int nXOff, int nYOff, int nXSize, int nYSize, void *
>>> pData, int nBufXSize, int nBufYSize, GDALDataType eBufType, GSpacing
>>> nPixelSpace, GSpacing nLineSpace, GDALRasterIOExtraArg * psExtraArg)
>>> (/build/scripts/gdal-3.2.2/gcore/gdalrasterband.cpp:372)
>>> import_export::Importer::<lambda(size_t, int)>::operator()(size_t, int)
>>> const(const import_export::Importer::<lambda(size_t, int)> * const
>>> __closure, const size_t thread_id, const int y)
>>> (/home/simon.eves/work/omniscidb-internal/ImportExport/Importer.cpp:5721)
>>> ...
>>>
>>> All of the parameters to the call are either constant or uncontended
>>> simple variables, and obviously there is a unique data buffer (pData) per
>>> thread.
>>>
>>> Is there anything we can do to make this work?
>>>
>>> I was intending to look into the lower level block-based API, in the
>>> hope that it will be faster, but have not yet done so.
>>>
>>> This is all with a local static build of GDAL 3.2.2 on Ubuntu 20.04 with
>>> GCC 9.
>>>
>>> Yours,
>>>
>>> Simon Eves
>>>
>>> --
>>> <http://www.omnisci.com/>
>>> Simon Eves
>>> Senior Graphics Engineer, Rendering Group
>>> 100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>>>
>>>
>>> Email: simon.eves at omnisci.com | Cell:  +1 (415) 902-1996
>>>
>>>
>>
>> --
>> <http://www.omnisci.com/>
>> Simon Eves
>> Senior Graphics Engineer, Rendering Group
>> 100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>>
>>
>> Email: simon.eves at omnisci.com | Cell:  +1 (415) 902-1996
>>
>>
>> _______________________________________________
>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>> -- http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>>
>
> --
> <http://www.omnisci.com/>
> Simon Eves
> Senior Graphics Engineer, Rendering Group
> 100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>
>
> Email: simon.eves at omnisci.com | Cell:  +1 (415) 902-1996
>
>

-- 
<http://www.omnisci.com/>
Simon Eves
Senior Graphics Engineer, Rendering Group
100 Montgomery St (5th Floor), San Francisco, CA 94104, USA


Email: simon.eves at omnisci.com | Cell:  +1 (415) 902-1996
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20211101/fc185e4c/attachment-0001.html>


More information about the gdal-dev mailing list