[gdal-dev] Geoid read performance

Vautour, André (INT) Andre.Vautour at Teledyne.com
Fri Oct 30 08:52:48 PDT 2020



> -----Original Message-----
> From: Even Rouault <even.rouault at spatialys.com>
> Sent: October 30, 2020 12:03
> To: gdal-dev at lists.osgeo.org
> Cc: Vautour, André (INT) <Andre.Vautour at Teledyne.com>
> Subject: Re: [gdal-dev] Geoid read performance
> 
> ---External Email---
> 
> On mardi 27 octobre 2020 17:17:28 CET Vautour, André (INT) wrote:
> > Hi all,
> >
> > I am using GDAL to read some geoid files in order to do some vertical
> > datum transformations. The transformation engine I am using is done
> > point by point transformations and doing a grid lookup for each point,
> > so this means when transforming say a million points, it is doing a
> > two million grid lookups (one for the source, and one for the destination).
> >
> > After a performance profiling run, it became clear that much of the
> > slowdown is because of mutexing. Most of the Geoid formats are based
> > on the RawDataset, which is mutexing on every read for two different
> reasons.
> > While one is about the accessing of the block cache, which for obvious
> > reasons cannot be avoided, but, the biggest bottleneck was the
> > CPLGetConfigOption("GDAL_ONE_BIG_READ") call in
> > RawRasterBand::CanUseDirectIO(). I am wondering if it would make sense
> > to move that call to the constructor and store the result for future
> > use? Or, would we expect that setting to change during the lifetime of
> > the raster, and the raster to react dynamically to those option changes?
> >
> > Since most geoids are really small grids, I opted to try to copy the
> > geoid to a MEMDataset raster. That had the benefit of avoiding the
> > CPLGetConfigOption() bottleneck and also avoiding the block cache.
> > That being said, the memory raster is always set to an access mode of
> GA_Update.
> > That means that reading will also try to acquire a mutex in
> > GDALDataset::EnterReadWrite. Would it make sense to be able to either
> > specify the access mode of a MEMDataset to GA_ReadOnly or to add a
> > SetAccess() method to it similar to RawRasterBand so that it can be
> > changed to read-only after the initial copy was done?
> >
> > Keep in mind that I am willing to make and contribute the necessary
> changes.
> > I just want to get a feel as to what would make sense at the general
> > level if any such changes are required. Also, do you have any other
> > suggestions on how to avoid such mutexes when reading what is
> essentially static data?
> 
> Clearly the GDAL API isn't made to deliver the ultimate performance when
> extracting points one by one, due to many layers traversed, checks, etc.
> 
> If you're at that level of performance tuning, instead of ingesting into a
> MEMDataset, you could probably just ingest into your favorite C++ array
> structure and read directly from it
> 

Yeah, I had a feeling you might come back with that. That make sense.

> An a for Linux/Unix users is to use GDALRasterBand::GetVirtualMemAuto(),
> that for RAW datasets will basically use mmap()

That is good point, ideally I would be using a memory mapped file for such a performance critical endeavor. I am not sure I would have the time to take that one right now, but regardless, thanks for the suggestions.

André

> 
> See
> https://gdal.org/api/gdalrasterband_cpp.html?
> _CPPv4N14GDALRasterBand17GetVirtualMemAutoE10GDALRWFlagPiP7GInt
> BigPPc
> 
> and
> 
> https://gdal.org/api/cpl.html#_CPPv420CPLVirtualMemGetAddrP13CPLVirtu
> alMem
> 
> 
> Even
> --
> Spatialys - Geospatial professional services http://www.spatialys.com



More information about the gdal-dev mailing list