[gdal-dev] kerchunk
Michael Sumner
mdsumner at gmail.com
Tue Jul 23 23:10:15 PDT 2024
On Wed, Jul 24, 2024 at 8:37 AM Michael Sumner <mdsumner at gmail.com> wrote:
> Hi, is there any effort or thought into something like Python's kerchunk
> in GDAL? (my summary of kerchunk is below)
>
> https://github.com/fsspec/kerchunk
>
> I'll be exploring the python outputs in detail and looking for hooks into
> where we might bring some of this tighter into GDAL. This would work
> nicely inside the GTI driver, for example. But, a *kerchunk-driver*? That
> would be in the family of raw/ drivers, my skillset won't have much to
> offer but I'm going to explore with some simpler examples. It could even
> bring old HDF4 files into the fold, I think.
>
> It's a bit weird from a GDAL perspective to map the chunks in a format for
> which we have a driver, but there's definitely performance advantages and
> convenience for virtualizing huge disparate collections (even the simplest
> time-series-of-files in netcdf is nicely abstracted here for xarray, a
> super-charged VRT for xarray).
>
>
I realized after posting that the ZARR driver is already geared to this
(!). I don't know if that is able to work with "references to byte ranges
in remote files", but I'll recast and explore what's there.
Cheers, Mike
.
Interested in any thoughts, feedback, pointers to related efforts ...
> thanks!
>
> (my take on) A description of kerchunk:
>
> kerchunk replaces the actual binary blobs on file in a Zarr with json
> references to a file/uri/object and the byte start and end values, in this
> way kerchunk brings formats like hdf/netcdf/grib into the fold of "cloud
> readiness" by having a complete separation of metadata from the actual
> storage. The information about those chunks (compression, type, orientation
> etc is stored in json also).
>
> (a Zarr is a multidimensional version of a single-zoom-level image
> tiling, imagine every image tile as a potentially n-dimensional child block
> of a larger array. The blobs are stored like one zoom of an z/y/x tile
> server [[[v/]w/]y/]x way (with a position for each dimension of the array,
> 1, 2, 3, 4, or n, and z is not special, and with more general encoding
> possibilities than tif/png/jpeg provide.) This scheme is extremely
> general, literally a virtualized array-like abstraction on any storage,
> and with kerchunk you can transcend many legacy issues with actual formats.
>
> Cheers, Mike
>
>
> --
> Michael Sumner
> Research Software Engineer
> Australian Antarctic Division
> Hobart, Australia
> e-mail: mdsumner at gmail.com
>
--
Michael Sumner
Research Software Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsumner at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240724/9089bfcf/attachment.htm>
More information about the gdal-dev
mailing list