[gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

Frank Warmerdam warmerdam at pobox.com
Wed Dec 18 10:53:37 PST 2013


Even,

Sorry, I was thinking of mmap() directly to the file, and having something
like:

CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
                                         int *pnPixelSpace,
                                         GIntBig *pnLineSpace,
                                         char **papszOptions );

I imagined an available virtual method on the band which could be
implemented - primarily by the RawBand class to try and mmap() the data and
return the layout.  But when that fails, or is unavailable it could use
your existing methodology with a layout that seems well tuned to the
underlying data organization.

Certainly there is no need to hold things up for this.  What you are
proposing is already wonderfully useful.  I'm wondering if there would be
ways of making what you propose work with Python Numpy in such a way that a
numpy array could be requested which is of this virtual memory.  That would
also be a nice extension.

Best regards,
Frank



On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
<even.rouault at mines-paris.org>wrote:

> Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> > Even,
> >
> > Very impressive work, I am supportive.
> >
> > IMHO it would be wonderful if there was also an mmap() based mechanism
> > where you could ask for the virtual memory chunk and you get it back (if
> it
> > works) along with stride values to access in it.  This could likely be
> made
> > to work for most "raw" based formats and a few others too.  It might also
> > allow non-mmap() based files to return an organization based more on
> their
> > actual organization for efficiency.
>
> Hi Frank,
>
> I'm not completely sure to have understood your idea. Would that be
> something
> like :
>
> CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
>                                          GDALRWFlag eRWFlag,
>                                          int nXOff, int nYOff,
>                                          int nXSize, int nYSize,
>                                          int nBufXSize, int nBufYSize,
>                                          GDALDataType eBufType,
>                                          int nBandCount, int* panBandMap,
>                                          int *pnPixelSpace,
>                                          GIntBig *pnLineSpace,
>                                          GIntBig *pnBandSpace,
>                                          size_t nCacheSize,
>                                          int bSingleThreadUsage,
>                                          char **papszOptions );
>
> Difference with GDALDatasetGetVirtualMem() : the stride values are now
> output
> values and no more nPageSizeHint parameter.
>
> In your mind, would the spacings be determined in a generic way from the
> dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item), or
> would that require some direct cooperation of the driver ?
>
> Since you mention raw formats, perhaps you are thinking more to a
> file-based
> mmap() rather than a anonymous mmap() combined with RasterIO(), like
> currently
> proposed ? This is something I've mentionned in the "Related thoughts"
> paragraph but there are practical annoyance with how Linux manages memory
> with
> file-based mmap(). I'd be happy if someone has successfull experience with
> that
> by the way (and that doesn't require explicit madvise() each time you're
> done
> with a range of memory)
>
> ---------------------------
>
> Reading again your words, I'm now wondering if you are not thinking to a
> Dataset / RasterBand virtual method that could be implemented by drivers ?
>
> virtual CPLVirtualMem* GetVirtualMem(.......)
>
> They would directly use the low-level CPLVirtualMem to create the mapping
> and
> provide their own callback to fill pages when page fault occurs. So they
> could
> potentially avoid using the block cache layer and do direct file I/O ?
>
> Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
> some
> circumstances with a non obvious heuristics) direct file I/O without going
> to
> the block cache. So the current proposed implementation could potentially
> already benefit from that. Perhaps we would need a flag to RasterIO to ask
> it to
> avoid block cache when possible. Or just call
> CPLSetThreadLocalConfigOption("GDAL_ONE_BIG_READ", "YES") in
> GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()
>
> Even
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html
>



-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20131218/9ca01833/attachment.html>


More information about the gdal-dev mailing list