[gdal-dev] [EXTERNAL] [BULK] Re: GTiff bit shuffle compression feature request

Even Rouault even.rouault at spatialys.com
Fri Dec 8 15:10:39 PST 2023


You could put Zarr into a ZIP. But there's little point in using SOZip 
for that use case (SOZIP has been merged into master 6 months ago by the 
way, in GDAL 3.7.0), since SOZip is for compressing large files. In a 
Zarr archive, you would have a lot of small/medium sized files for each 
chunk/tile. And when you need to read one, you read it in its whole 
(where SOZip aim is to be able to read efficiently a subset of a 
compressed file). SOZip main use case is more for vector datasets 
(geopackage, flatgeobuf, potentially Esri file geodatabase...)

For Zarr in ZIP, you should either use uncompressed Zarr and use Zip 
deflate compression, or compressed Zarr (blosc, whatever) and use 
uncompressed Zip ("store method").  If you have a Zarr dataset with lots 
of tiles, it might actually be relevant to use the zipindex 
(https://github.com/minio/zipindex) extension to locate more quickly 
each Zarr chunk, but GDAL won't make use of it.

Le 08/12/2023 à 21:23, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] via gdal-dev a écrit :
>
> The underlying network file system is opaque to us and can change on 
> occasion.  But recently our team were asked to cull unused files due 
> to inode counts.
>
> We’re excited to explore SOZip on our vector data where random seek is 
> important to us, but we’re waiting for that branch to be merged into 
> master.   I don’t trust standard zip libraries to be performant for 
> this usage case but I’m willing to be shown otherwise.
>
> Jesse
>
> *From: *gdal-dev <gdal-dev-bounces at lists.osgeo.org> on behalf of 
> Laurențiu Nicola via gdal-dev <gdal-dev at lists.osgeo.org>
> *Reply-To: *Laurențiu Nicola <lnicola at dend.ro>
> *Date: *Friday, December 8, 2023 at 3:01 PM
> *To: *gdallists <gdal-dev at lists.osgeo.org>
> *Subject: *[EXTERNAL] [BULK] Re: [gdal-dev] GTiff bit shuffle 
> compression feature request
>
> *CAUTION:*This email originated from outside of NASA.  Please take 
> care when clicking links or opening attachments. Use the "Report 
> Message" button to report suspicious messages to the NASA SOC.
>
>
>
> On Fri, Dec 8, 2023, at 21:32, Even Rouault wrote:
>
>     yes, poor wording of mine. I meant that if using PREDICTOR=3, one
>     should compare with FILTER=DELTA. But looking more closely, they
>     are not strictly equivalent. PREDICTOR=3 applies the delta as
>     b[0]-a[0], b[1]-a[1], b[2]-a[2], b[3]-a[3] where a[0...3] and
>     b[0...3] are seen as the 4 byte representation of the float32,
>     whereas FILTER=DELTA does the difference b_float - a_float as
>     floating point. This isn't the same...
>
> https://www.blosc.org/posts/bytedelta-enhance-compression-toolset/ 
> seems to be the equivalent.
>
> > inode allocation
>
> XFS or ZIP?
>
> > extra step to decompress Zarr out of ZIP
>
> Most libraries should be able to read Zarr directly from a ZIP archive.
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20231209/6119363f/attachment-0001.htm>


More information about the gdal-dev mailing list