[gdal-dev] kerchunk

Michael Sumner mdsumner at gmail.com
Tue Jul 23 23:02:51 PDT 2024


Thanks Joe, will check this out

On Wed, Jul 24, 2024 at 12:30 PM Joe Lee <hyoklee at hdfgroup.org> wrote:

> Hi, Michael!
>
> It's an interesting idea since Kerchunk can't handle HDF4 yet [1].
> OPeNDAP DMR++ now can handle HDF4
> so I think Kerchunk can do, too.
>
> For GDAL, is there C++ binding for Kerchunk?
> I think that will be the main blocker for GDAL driver development.
>
> [1] https://github.com/hyoklee/kerchunk/wiki
>
> ---
> Reality is hierarchical. Store scientific reality in HDF for Spatial
> Computing.
>
>
>
> ________________________________________
> From: gdal-dev <gdal-dev-bounces at lists.osgeo.org> on behalf of Michael
> Sumner via gdal-dev <gdal-dev at lists.osgeo.org>
> Sent: Tuesday, July 23, 2024 17:37
> To: gdal-dev
> Subject: [gdal-dev] kerchunk
>
> Hi, is there any effort or thought into something like Python's kerchunk
> in GDAL?   (my summary of kerchunk is below)
>
> https://github.com/fsspec/kerchunk
>
> I'll be exploring the python outputs in detail and looking for hooks into
> where we might bring some of this tighter into GDAL.  This would work
> nicely inside the GTI driver, for example. But,  a *kerchunk-driver*? That
> would be in the family of raw/ drivers, my skillset won't have much to
> offer but I'm going to explore with some simpler examples.   It could even
> bring old HDF4 files into the fold, I think.
>
> It's a bit weird from a GDAL perspective to map the chunks in a format for
> which we have a driver, but there's definitely performance advantages and
> convenience for virtualizing huge disparate collections (even the simplest
> time-series-of-files in netcdf is nicely abstracted here for xarray, a
> super-charged VRT for xarray).
>
> Interested in any thoughts, feedback, pointers to related efforts ...
> thanks!
>
> (my take on) A description of kerchunk:
>
> kerchunk replaces the actual binary blobs on file in a Zarr with json
> references to a file/uri/object and the byte start and end values, in this
> way kerchunk brings formats like hdf/netcdf/grib into the fold of "cloud
> readiness" by having a complete separation of metadata from the actual
> storage. The information about those chunks (compression, type, orientation
> etc is stored in json also).
>
> (a Zarr  is a multidimensional version of a single-zoom-level image
> tiling, imagine every image tile as a potentially n-dimensional child block
> of a larger array. The blobs are stored like one zoom of an z/y/x tile
> server [[[v/]w/]y/]x way (with a position for each dimension of the array,
> 1, 2, 3, 4, or n, and z is not special, and with more general encoding
> possibilities than tif/png/jpeg provide.)  This scheme is extremely
> general,  literally a virtualized array-like abstraction on any storage,
> and with kerchunk you can transcend many legacy issues with actual formats.
>
> Cheers, Mike
>
>
> --
> Michael Sumner
> Research Software Engineer
> Australian Antarctic Division
> Hobart, Australia
> e-mail: mdsumner at gmail.com<mailto:mdsumner at gmail.com>
>


-- 
Michael Sumner
Research Software Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsumner at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240724/496f78b6/attachment.htm>


More information about the gdal-dev mailing list