[gdal-dev] Geoid read performance

Even Rouault even.rouault at spatialys.com
Fri Oct 30 08:02:44 PDT 2020


On mardi 27 octobre 2020 17:17:28 CET Vautour, André (INT) wrote:
> Hi all,
> 
> I am using GDAL to read some geoid files in order to do some vertical datum
> transformations. The transformation engine I am using is done point by
> point transformations and doing a grid lookup for each point, so this means
> when transforming say a million points, it is doing a two million grid
> lookups (one for the source, and one for the destination).
> 
> After a performance profiling run, it became clear that much of the slowdown
> is because of mutexing. Most of the Geoid formats are based on the
> RawDataset, which is mutexing on every read for two different reasons.
> While one is about the accessing of the block cache, which for obvious
> reasons cannot be avoided, but, the biggest bottleneck was the
> CPLGetConfigOption("GDAL_ONE_BIG_READ") call in
> RawRasterBand::CanUseDirectIO(). I am wondering if it would make sense to
> move that call to the constructor and store the result for future use? Or,
> would we expect that setting to change during the lifetime of the raster,
> and the raster to react dynamically to those option changes?
> 
> Since most geoids are really small grids, I opted to try to copy the geoid
> to a MEMDataset raster. That had the benefit of avoiding the
> CPLGetConfigOption() bottleneck and also avoiding the block cache. That
> being said, the memory raster is always set to an access mode of GA_Update.
> That means that reading will also try to acquire a mutex in
> GDALDataset::EnterReadWrite. Would it make sense to be able to either
> specify the access mode of a MEMDataset to GA_ReadOnly or to add a
> SetAccess() method to it similar to RawRasterBand so that it can be changed
> to read-only after the initial copy was done?
> 
> Keep in mind that I am willing to make and contribute the necessary changes.
> I just want to get a feel as to what would make sense at the general level
> if any such changes are required. Also, do you have any other suggestions
> on how to avoid such mutexes when reading what is essentially static data?

Clearly the GDAL API isn't made to deliver the ultimate performance when 
extracting points one by one, due to many layers traversed, checks, etc.

If you're at that level of performance tuning, instead of ingesting into a 
MEMDataset, you could probably just ingest into your favorite C++ array 
structure and read directly from it

An a for Linux/Unix users is to use GDALRasterBand::GetVirtualMemAuto(), that 
for RAW datasets will basically use mmap()

See
https://gdal.org/api/gdalrasterband_cpp.html?
_CPPv4N14GDALRasterBand17GetVirtualMemAutoE10GDALRWFlagPiP7GIntBigPPc

and

https://gdal.org/api/cpl.html#_CPPv420CPLVirtualMemGetAddrP13CPLVirtualMem


Even
-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list