[QGIS-Developer] Fwd: [gdal-dev] Announcing SOZip: Seek-Optimized profile for the .zip format

Richard Duivenvoorde rdmailings at duif.net
Mon Jan 9 07:54:55 PST 2023


Hi Even,

Cool!

Out of curiosity: I thought sqlite/geopackage was already relatively 'skinny' packed? Am I wrong?

Anybody has experience with zipping "laaaarge zipped GeoPackages"? ;-) Is that useful? I just tested a 885MB mbtiles file (I know not geopackage...but still sqlite isnt't it?), and that ended up in 868MB.

OR... are we talking about sets of geopackages?

Regards,

Richard Duivenvoorde


On 1/9/23 16:42, Even Rouault via QGIS-Developer wrote:
> Sorry for cross-posting, but very relevant topic for QGIS. To make it short, pending compressing .zip files in the SOZip way, it is possible to directly read laaaarge zipped GeoPackage files (or Shapefiles for nostalgic) from QGIS without prior decompression
> 
> Even
> 
> -------- Message transféré --------
> Sujet : 	[gdal-dev] Announcing SOZip: Seek-Optimized profile for the .zip format
> Date : 	Mon, 9 Jan 2023 15:19:07 +0100
> De : 	Even Rouault <even.rouault at spatialys.com>
> Pour : 	gdal-dev at lists.osgeo.org <gdal-dev at lists.osgeo.org>
> 
> 
> 
> Hi,
> 
> It is my pleasure to announce ( https://github.com/sozip/sozip-spec/blob/master/blog/01-announcement.md ) the initial release of the specification ( https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md ) for the SOZip (Seek-Optimized Zip) profile to the ZIP file format, as well as its GDAL implementation.
> 
> What is SOZip ?
> ----------------------
> 
> A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one or several Deflate-compressed files that are organized and annotated such that a SOZip-aware reader can perform very fast random access (seek) within a compressed file.
> 
> SOZip makes it possible to access large compressed files directly from a .zip file without prior decompression. It is not a new file format, but a profile of the existing ZIP format, done in a fully backward compatible way. ZIP readers that are non-SOZip aware can read a SOZip-enabled file normally and ignore the extended features that support efficient seek capability.
> 
> Use cases
> --------------
> 
> The SOZip specification is intended to be general purpose / not domain specific. It was first developed to serve geospatial use cases, which commonly have large compressed files inside of ZIP archives. In particular, it makes it possible for users to read large GIS files using the Shapefile, GeoPackage or FlatGeobuf formats (which have no native provision for compression) compressed in .zip files without prior decompression.
> 
> Efficient random access and selective decompression are a requirement to provide acceptable performance in many usage scenarios: spatial index filtering, access to a feature by its identifier, etc.
> 
> Performance
> ------------------
> 
> SOZip is efficient:
> 
> * The overhead of using a file from a SOZip archive, compared to using it uncompressed, is of the order of 10% for common read operations.
> * Generation of a SOZip file can be much faster than regular ZIP generation when using multithreading.
> * SOZip files are typically only ~ 5% larger than regular ZIPs (dependent on content, and chunk size)
> 
> Have a look at benchmarking results: https://github.com/sozip/sozip-spec/blob/master/README.md#benchmarking
> 
> Other ZIP related specification
> ------------------------------------------
> 
> The SOZip GitHub organization also hosts the KeyValuePairs extra-field specification ( https://github.com/sozip/keyvaluepairs-spec/blob/master/zip_keyvalue_extra_field_specification.md ), to be able to encode arbitrary key-value pairs of metadata associated with a file within a ZIP. For example to store the Content-Type of a file.
> 
> How does this relate to GDAL ?
> -------------------------------------------
> 
> Pull request https://github.com/OSGeo/gdal/pull/7042 has been submitted with the following enhancements:
> 
> *  The /vsizip/ virtual file system uses the SOZip index to perform fast
>      random access within a compressed SOZip-enabled file.
> 
> * The Shapefile and GPKG drivers can directly generate SOZip-enabled .shz/.shp.zip or .gpkg.zip files.
> 
> *  Addition of the CPLAddFileInZip() C function that can compress a file and add
>      it to an new or existing ZIP file, and enable the SOZip optimization when relevant.
> 
> *  The existed VSIGetFileMetadata() method can be called on a filename of
>      the form /vsizip/path/to/the/file.zip/path/inside/the/zip/file and
>      with domain = "ZIP" to get information if a SOZip index is available for that file.
> 
> *  The sozip (https://github.com/rouault/gdal/blob/sozip/doc/source/programs/sozip.rst) new command line utility
>      can be used to create a seek-optimized ZIP file, to append files to an existing ZIP file, list the
>      contents of a ZIP file and display the SOZip optimization status or validate a SOZip file.
> 
> Best regards,
> 
> Even
> 
> -- 
> 
> http://www.spatialys.com
> My software is free, but my time generally not.
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
> 
> _______________________________________________
> QGIS-Developer mailing list
> QGIS-Developer at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer



More information about the QGIS-Developer mailing list