[gdal-dev] Announcing SOZip: Seek-Optimized profile for the .zip format
Even Rouault
even.rouault at spatialys.com
Thu Jan 26 13:27:21 PST 2023
Hi,
The implementation has just been merged into GDAL master
You can for example test it with gdal-master and QGIS Conda packages:
conda create --name sozip_test
conda activate sozip_test
conda install -c gdal-master gdal
conda install -c conda-forge qgis
And for example with a large GeoPackage file (filenames to adapt of course):
sozip -j output.gpkg.zip /path/to/input.gpkg
qgis output.gpkg.zip
Even
Le 09/01/2023 à 15:19, Even Rouault a écrit :
> Hi,
>
> It is my pleasure to announce (
> https://github.com/sozip/sozip-spec/blob/master/blog/01-announcement.md
> ) the initial release of the specification (
> https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md
> ) for the SOZip (Seek-Optimized Zip) profile to the ZIP file format,
> as well as its GDAL implementation.
>
> What is SOZip ?
> ----------------------
>
> A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one or
> several Deflate-compressed files that are organized and annotated such
> that a SOZip-aware reader can perform very fast random access (seek)
> within a compressed file.
>
> SOZip makes it possible to access large compressed files directly from
> a .zip file without prior decompression. It is not a new file format,
> but a profile of the existing ZIP format, done in a fully backward
> compatible way. ZIP readers that are non-SOZip aware can read a
> SOZip-enabled file normally and ignore the extended features that
> support efficient seek capability.
>
> Use cases
> --------------
>
> The SOZip specification is intended to be general purpose / not domain
> specific. It was first developed to serve geospatial use cases, which
> commonly have large compressed files inside of ZIP archives. In
> particular, it makes it possible for users to read large GIS files
> using the Shapefile, GeoPackage or FlatGeobuf formats (which have no
> native provision for compression) compressed in .zip files without
> prior decompression.
>
> Efficient random access and selective decompression are a requirement
> to provide acceptable performance in many usage scenarios: spatial
> index filtering, access to a feature by its identifier, etc.
>
> Performance
> ------------------
>
> SOZip is efficient:
>
> * The overhead of using a file from a SOZip archive, compared to using
> it uncompressed, is of the order of 10% for common read operations.
> * Generation of a SOZip file can be much faster than regular ZIP
> generation when using multithreading.
> * SOZip files are typically only ~ 5% larger than regular ZIPs
> (dependent on content, and chunk size)
>
> Have a look at benchmarking results:
> https://github.com/sozip/sozip-spec/blob/master/README.md#benchmarking
>
> Other ZIP related specification
> ------------------------------------------
>
> The SOZip GitHub organization also hosts the KeyValuePairs extra-field
> specification (
> https://github.com/sozip/keyvaluepairs-spec/blob/master/zip_keyvalue_extra_field_specification.md
> ), to be able to encode arbitrary key-value pairs of metadata
> associated with a file within a ZIP. For example to store the
> Content-Type of a file.
>
> How does this relate to GDAL ?
> -------------------------------------------
>
> Pull request https://github.com/OSGeo/gdal/pull/7042 has been
> submitted with the following enhancements:
>
> * The /vsizip/ virtual file system uses the SOZip index to perform fast
> random access within a compressed SOZip-enabled file.
>
> * The Shapefile and GPKG drivers can directly generate SOZip-enabled
> .shz/.shp.zip or .gpkg.zip files.
>
> * Addition of the CPLAddFileInZip() C function that can compress a
> file and add
> it to an new or existing ZIP file, and enable the SOZip
> optimization when relevant.
>
> * The existed VSIGetFileMetadata() method can be called on a filename of
> the form /vsizip/path/to/the/file.zip/path/inside/the/zip/file and
> with domain = "ZIP" to get information if a SOZip index is
> available for that file.
>
> * The sozip
> (https://github.com/rouault/gdal/blob/sozip/doc/source/programs/sozip.rst)
> new command line utility
> can be used to create a seek-optimized ZIP file, to append files
> to an existing ZIP file, list the
> contents of a ZIP file and display the SOZip optimization status
> or validate a SOZip file.
>
> Best regards,
>
> Even
>
--
http://www.spatialys.com
My software is free, but my time generally not.
More information about the gdal-dev
mailing list