[gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"
Even Rouault
even.rouault at mines-paris.org
Wed Dec 18 11:46:43 PST 2013
Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
> Even,
>
> Sorry, I was thinking of mmap() directly to the file, and having something
> like:
>
> CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
> int *pnPixelSpace,
> GIntBig *pnLineSpace,
> char **papszOptions );
>
> I imagined an available virtual method on the band which could be
> implemented - primarily by the RawBand class to try and mmap() the data and
> return the layout. But when that fails, or is unavailable it could use
> your existing methodology with a layout that seems well tuned to the
> underlying data organization.
Yes, that should be doable, but with the limitation I raised about the memory
management of file-based mmap() : if you mmap() a file larger than RAM, and read
it entirely, without explicit madvise() to discard regions no longer needed,
it will fill RAM and cause disk swapping. I should retest to confirm. Perhaps
there are some OS level tuning to avoid that ?
>
> Certainly there is no need to hold things up for this. What you are
> proposing is already wonderfully useful.
I've no particular timetable for this. This started as an experiment. So I'm
happy to explore complementary ideas.
> I'm wondering if there would be
> ways of making what you propose work with Python Numpy in such a way that a
> numpy array could be requested which is of this virtual memory. That would
> also be a nice extension.
Hum, how would that be different from what is proposed in the SWIG bindings
section of the RFC ?
>
> Best regards,
> Frank
>
>
>
> On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
>
> <even.rouault at mines-paris.org>wrote:
> > Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
> > > Even,
> > >
> > > Very impressive work, I am supportive.
> > >
> > > IMHO it would be wonderful if there was also an mmap() based mechanism
> > > where you could ask for the virtual memory chunk and you get it back
> > > (if
> >
> > it
> >
> > > works) along with stride values to access in it. This could likely be
> >
> > made
> >
> > > to work for most "raw" based formats and a few others too. It might
> > > also allow non-mmap() based files to return an organization based more
> > > on
> >
> > their
> >
> > > actual organization for efficiency.
> >
> > Hi Frank,
> >
> > I'm not completely sure to have understood your idea. Would that be
> > something
> > like :
> >
> > CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
> >
> > GDALRWFlag eRWFlag,
> > int nXOff, int nYOff,
> > int nXSize, int nYSize,
> > int nBufXSize, int nBufYSize,
> > GDALDataType eBufType,
> > int nBandCount, int* panBandMap,
> > int *pnPixelSpace,
> > GIntBig *pnLineSpace,
> > GIntBig *pnBandSpace,
> > size_t nCacheSize,
> > int bSingleThreadUsage,
> > char **papszOptions );
> >
> > Difference with GDALDatasetGetVirtualMem() : the stride values are now
> > output
> > values and no more nPageSizeHint parameter.
> >
> > In your mind, would the spacings be determined in a generic way from the
> > dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item),
> > or would that require some direct cooperation of the driver ?
> >
> > Since you mention raw formats, perhaps you are thinking more to a
> > file-based
> > mmap() rather than a anonymous mmap() combined with RasterIO(), like
> > currently
> > proposed ? This is something I've mentionned in the "Related thoughts"
> > paragraph but there are practical annoyance with how Linux manages memory
> > with
> > file-based mmap(). I'd be happy if someone has successfull experience
> > with that
> > by the way (and that doesn't require explicit madvise() each time you're
> > done
> > with a range of memory)
> >
> > ---------------------------
> >
> > Reading again your words, I'm now wondering if you are not thinking to a
> > Dataset / RasterBand virtual method that could be implemented by drivers
> > ?
> >
> > virtual CPLVirtualMem* GetVirtualMem(.......)
> >
> > They would directly use the low-level CPLVirtualMem to create the mapping
> > and
> > provide their own callback to fill pages when page fault occurs. So they
> > could
> > potentially avoid using the block cache layer and do direct file I/O ?
> >
> > Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
> > some
> > circumstances with a non obvious heuristics) direct file I/O without
> > going to
> > the block cache. So the current proposed implementation could potentially
> > already benefit from that. Perhaps we would need a flag to RasterIO to
> > ask it to
> > avoid block cache when possible. Or just call
> > CPLSetThreadLocalConfigOption("GDAL_ONE_BIG_READ", "YES") in
> > GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()
> >
> > Even
> >
> > --
> > Geospatial professional services
> > http://even.rouault.free.fr/services.html
--
Geospatial professional services
http://even.rouault.free.fr/services.html
More information about the gdal-dev
mailing list