[QGIS-Developer] Fwd: [gdal-dev] Announcing SOZip: Seek-Optimized profile for the .zip format
Even Rouault
even.rouault at spatialys.com
Mon Jan 9 07:42:47 PST 2023
Sorry for cross-posting, but very relevant topic for QGIS. To make it
short, pending compressing .zip files in the SOZip way, it is possible
to directly read laaaarge zipped GeoPackage files (or Shapefiles for
nostalgic) from QGIS without prior decompression
Even
-------- Message transféré --------
Sujet : [gdal-dev] Announcing SOZip: Seek-Optimized profile for the
.zip format
Date : Mon, 9 Jan 2023 15:19:07 +0100
De : Even Rouault <even.rouault at spatialys.com>
Pour : gdal-dev at lists.osgeo.org <gdal-dev at lists.osgeo.org>
Hi,
It is my pleasure to announce (
https://github.com/sozip/sozip-spec/blob/master/blog/01-announcement.md
) the initial release of the specification (
https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md )
for the SOZip (Seek-Optimized Zip) profile to the ZIP file format, as
well as its GDAL implementation.
What is SOZip ?
----------------------
A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one or
several Deflate-compressed files that are organized and annotated such
that a SOZip-aware reader can perform very fast random access (seek)
within a compressed file.
SOZip makes it possible to access large compressed files directly from a
.zip file without prior decompression. It is not a new file format, but
a profile of the existing ZIP format, done in a fully backward
compatible way. ZIP readers that are non-SOZip aware can read a
SOZip-enabled file normally and ignore the extended features that
support efficient seek capability.
Use cases
--------------
The SOZip specification is intended to be general purpose / not domain
specific. It was first developed to serve geospatial use cases, which
commonly have large compressed files inside of ZIP archives. In
particular, it makes it possible for users to read large GIS files using
the Shapefile, GeoPackage or FlatGeobuf formats (which have no native
provision for compression) compressed in .zip files without prior
decompression.
Efficient random access and selective decompression are a requirement to
provide acceptable performance in many usage scenarios: spatial index
filtering, access to a feature by its identifier, etc.
Performance
------------------
SOZip is efficient:
* The overhead of using a file from a SOZip archive, compared to using
it uncompressed, is of the order of 10% for common read operations.
* Generation of a SOZip file can be much faster than regular ZIP
generation when using multithreading.
* SOZip files are typically only ~ 5% larger than regular ZIPs
(dependent on content, and chunk size)
Have a look at benchmarking results:
https://github.com/sozip/sozip-spec/blob/master/README.md#benchmarking
Other ZIP related specification
------------------------------------------
The SOZip GitHub organization also hosts the KeyValuePairs extra-field
specification (
https://github.com/sozip/keyvaluepairs-spec/blob/master/zip_keyvalue_extra_field_specification.md
), to be able to encode arbitrary key-value pairs of metadata associated
with a file within a ZIP. For example to store the Content-Type of a file.
How does this relate to GDAL ?
-------------------------------------------
Pull request https://github.com/OSGeo/gdal/pull/7042 has been submitted
with the following enhancements:
* The /vsizip/ virtual file system uses the SOZip index to perform fast
random access within a compressed SOZip-enabled file.
* The Shapefile and GPKG drivers can directly generate SOZip-enabled
.shz/.shp.zip or .gpkg.zip files.
* Addition of the CPLAddFileInZip() C function that can compress a file
and add
it to an new or existing ZIP file, and enable the SOZip
optimization when relevant.
* The existed VSIGetFileMetadata() method can be called on a filename of
the form /vsizip/path/to/the/file.zip/path/inside/the/zip/file and
with domain = "ZIP" to get information if a SOZip index is
available for that file.
* The sozip
(https://github.com/rouault/gdal/blob/sozip/doc/source/programs/sozip.rst)
new command line utility
can be used to create a seek-optimized ZIP file, to append files to
an existing ZIP file, list the
contents of a ZIP file and display the SOZip optimization status or
validate a SOZip file.
Best regards,
Even
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20230109/cdb2aee4/attachment.htm>
More information about the QGIS-Developer
mailing list