[gdal-dev] kerchunk
Michael Sumner
mdsumner at gmail.com
Tue Jul 23 15:37:18 PDT 2024
Hi, is there any effort or thought into something like Python's kerchunk in
GDAL? (my summary of kerchunk is below)
https://github.com/fsspec/kerchunk
I'll be exploring the python outputs in detail and looking for hooks into
where we might bring some of this tighter into GDAL. This would work
nicely inside the GTI driver, for example. But, a *kerchunk-driver*? That
would be in the family of raw/ drivers, my skillset won't have much to
offer but I'm going to explore with some simpler examples. It could even
bring old HDF4 files into the fold, I think.
It's a bit weird from a GDAL perspective to map the chunks in a format for
which we have a driver, but there's definitely performance advantages and
convenience for virtualizing huge disparate collections (even the simplest
time-series-of-files in netcdf is nicely abstracted here for xarray, a
super-charged VRT for xarray).
Interested in any thoughts, feedback, pointers to related efforts ...
thanks!
(my take on) A description of kerchunk:
kerchunk replaces the actual binary blobs on file in a Zarr with json
references to a file/uri/object and the byte start and end values, in this
way kerchunk brings formats like hdf/netcdf/grib into the fold of "cloud
readiness" by having a complete separation of metadata from the actual
storage. The information about those chunks (compression, type, orientation
etc is stored in json also).
(a Zarr is a multidimensional version of a single-zoom-level image tiling,
imagine every image tile as a potentially n-dimensional child block of a
larger array. The blobs are stored like one zoom of an z/y/x tile server
[[[v/]w/]y/]x way (with a position for each dimension of the array, 1, 2,
3, 4, or n, and z is not special, and with more general encoding
possibilities than tif/png/jpeg provide.) This scheme is extremely
general, literally a virtualized array-like abstraction on any storage,
and with kerchunk you can transcend many legacy issues with actual formats.
Cheers, Mike
--
Michael Sumner
Research Software Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsumner at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240724/b55281b3/attachment.htm>
More information about the gdal-dev
mailing list