[gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

Even Rouault even.rouault at mines-paris.org
Wed Dec 18 11:46:43 PST 2013


Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
> Even,
> 
> Sorry, I was thinking of mmap() directly to the file, and having something
> like:
> 
> CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
>                                          int *pnPixelSpace,
>                                          GIntBig *pnLineSpace,
>                                          char **papszOptions );
> 
> I imagined an available virtual method on the band which could be
> implemented - primarily by the RawBand class to try and mmap() the data and
> return the layout.  But when that fails, or is unavailable it could use
> your existing methodology with a layout that seems well tuned to the
> underlying data organization.

Yes, that should be doable, but with the limitation I raised about the memory 
management of file-based mmap() : if you mmap() a file larger than RAM, and read 
it entirely, without explicit madvise() to discard regions no longer needed, 
it will fill RAM and cause disk swapping. I should retest to confirm. Perhaps 
there are some OS level tuning to avoid that ?

> 
> Certainly there is no need to hold things up for this.  What you are
> proposing is already wonderfully useful. 

I've no particular timetable for this. This started as an experiment. So I'm 
happy to explore complementary ideas.

> I'm wondering if there would be
> ways of making what you propose work with Python Numpy in such a way that a
> numpy array could be requested which is of this virtual memory.  That would
> also be a nice extension.

Hum, how would that be different from what is proposed in the SWIG bindings 
section of the RFC ?

> 
> Best regards,
> Frank
> 
> 
> 
> On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
> 
> <even.rouault at mines-paris.org>wrote:
> > Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> > > Even,
> > > 
> > > Very impressive work, I am supportive.
> > > 
> > > IMHO it would be wonderful if there was also an mmap() based mechanism
> > > where you could ask for the virtual memory chunk and you get it back
> > > (if
> > 
> > it
> > 
> > > works) along with stride values to access in it.  This could likely be
> > 
> > made
> > 
> > > to work for most "raw" based formats and a few others too.  It might
> > > also allow non-mmap() based files to return an organization based more
> > > on
> > 
> > their
> > 
> > > actual organization for efficiency.
> > 
> > Hi Frank,
> > 
> > I'm not completely sure to have understood your idea. Would that be
> > something
> > like :
> > 
> > CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
> > 
> >                                          GDALRWFlag eRWFlag,
> >                                          int nXOff, int nYOff,
> >                                          int nXSize, int nYSize,
> >                                          int nBufXSize, int nBufYSize,
> >                                          GDALDataType eBufType,
> >                                          int nBandCount, int* panBandMap,
> >                                          int *pnPixelSpace,
> >                                          GIntBig *pnLineSpace,
> >                                          GIntBig *pnBandSpace,
> >                                          size_t nCacheSize,
> >                                          int bSingleThreadUsage,
> >                                          char **papszOptions );
> > 
> > Difference with GDALDatasetGetVirtualMem() : the stride values are now
> > output
> > values and no more nPageSizeHint parameter.
> > 
> > In your mind, would the spacings be determined in a generic way from the
> > dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item),
> > or would that require some direct cooperation of the driver ?
> > 
> > Since you mention raw formats, perhaps you are thinking more to a
> > file-based
> > mmap() rather than a anonymous mmap() combined with RasterIO(), like
> > currently
> > proposed ? This is something I've mentionned in the "Related thoughts"
> > paragraph but there are practical annoyance with how Linux manages memory
> > with
> > file-based mmap(). I'd be happy if someone has successfull experience
> > with that
> > by the way (and that doesn't require explicit madvise() each time you're
> > done
> > with a range of memory)
> > 
> > ---------------------------
> > 
> > Reading again your words, I'm now wondering if you are not thinking to a
> > Dataset / RasterBand virtual method that could be implemented by drivers
> > ?
> > 
> > virtual CPLVirtualMem* GetVirtualMem(.......)
> > 
> > They would directly use the low-level CPLVirtualMem to create the mapping
> > and
> > provide their own callback to fill pages when page fault occurs. So they
> > could
> > potentially avoid using the block cache layer and do direct file I/O ?
> > 
> > Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
> > some
> > circumstances with a non obvious heuristics) direct file I/O without
> > going to
> > the block cache. So the current proposed implementation could potentially
> > already benefit from that. Perhaps we would need a flag to RasterIO to
> > ask it to
> > avoid block cache when possible. Or just call
> > CPLSetThreadLocalConfigOption("GDAL_ONE_BIG_READ", "YES") in
> > GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()
> > 
> > Even
> > 
> > --
> > Geospatial professional services
> > http://even.rouault.free.fr/services.html

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html


More information about the gdal-dev mailing list