[gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

Even Rouault even.rouault at mines-paris.org
Wed Dec 18 02:10:19 PST 2013


Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> Even,
> 
> Very impressive work, I am supportive.
> 
> IMHO it would be wonderful if there was also an mmap() based mechanism
> where you could ask for the virtual memory chunk and you get it back (if it
> works) along with stride values to access in it.  This could likely be made
> to work for most "raw" based formats and a few others too.  It might also
> allow non-mmap() based files to return an organization based more on their
> actual organization for efficiency.

Hi Frank,

I'm not completely sure to have understood your idea. Would that be something 
like :

CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
                                         GDALRWFlag eRWFlag,
                                         int nXOff, int nYOff,
                                         int nXSize, int nYSize,
                                         int nBufXSize, int nBufYSize,
                                         GDALDataType eBufType,
                                         int nBandCount, int* panBandMap,
                                         int *pnPixelSpace,
                                         GIntBig *pnLineSpace,
                                         GIntBig *pnBandSpace,
                                         size_t nCacheSize,
                                         int bSingleThreadUsage,
                                         char **papszOptions );

Difference with GDALDatasetGetVirtualMem() : the stride values are now output 
values and no more nPageSizeHint parameter.

In your mind, would the spacings be determined in a generic way from the 
dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item), or 
would that require some direct cooperation of the driver ?

Since you mention raw formats, perhaps you are thinking more to a file-based 
mmap() rather than a anonymous mmap() combined with RasterIO(), like currently 
proposed ? This is something I've mentionned in the "Related thoughts" 
paragraph but there are practical annoyance with how Linux manages memory with 
file-based mmap(). I'd be happy if someone has successfull experience with that 
by the way (and that doesn't require explicit madvise() each time you're done 
with a range of memory)

---------------------------

Reading again your words, I'm now wondering if you are not thinking to a 
Dataset / RasterBand virtual method that could be implemented by drivers ?

virtual CPLVirtualMem* GetVirtualMem(.......)

They would directly use the low-level CPLVirtualMem to create the mapping and 
provide their own callback to fill pages when page fault occurs. So they could 
potentially avoid using the block cache layer and do direct file I/O ?

Looking at RawRasterBand::IRasterIO(), I can see that it can use (under some 
circumstances with a non obvious heuristics) direct file I/O without going to 
the block cache. So the current proposed implementation could potentially 
already benefit from that. Perhaps we would need a flag to RasterIO to ask it to 
avoid block cache when possible. Or just call 
CPLSetThreadLocalConfigOption("GDAL_ONE_BIG_READ", "YES") in 
GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()

Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html


More information about the gdal-dev mailing list