<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Sorry for cross-posting, but very relevant topic for QGIS. To
make it short, pending compressing .zip files in the SOZip way, it
is possible to directly read laaaarge zipped GeoPackage files (or
Shapefiles for nostalgic) from QGIS without prior decompression</p>
<p>Even<br>
</p>
<div class="moz-forward-container">-------- Message transféré
--------
<table class="moz-email-headers-table" cellspacing="0"
cellpadding="0" border="0">
<tbody>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">Sujet :
</th>
<td>[gdal-dev] Announcing SOZip: Seek-Optimized profile for
the .zip format</td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">Date : </th>
<td>Mon, 9 Jan 2023 15:19:07 +0100</td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">De : </th>
<td>Even Rouault <a class="moz-txt-link-rfc2396E" href="mailto:even.rouault@spatialys.com"><even.rouault@spatialys.com></a></td>
</tr>
<tr>
<th valign="BASELINE" nowrap="nowrap" align="RIGHT">Pour : </th>
<td><a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-rfc2396E" href="mailto:gdal-dev@lists.osgeo.org"><gdal-dev@lists.osgeo.org></a></td>
</tr>
</tbody>
</table>
<br>
<br>
Hi,<br>
<br>
It is my pleasure to announce (
<a class="moz-txt-link-freetext" href="https://github.com/sozip/sozip-spec/blob/master/blog/01-announcement.md">https://github.com/sozip/sozip-spec/blob/master/blog/01-announcement.md</a>
) the initial release of the specification (
<a class="moz-txt-link-freetext" href="https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md">https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md</a>
) for the SOZip (Seek-Optimized Zip) profile to the ZIP file
format, as well as its GDAL implementation.<br>
<br>
What is SOZip ?<br>
----------------------<br>
<br>
A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one
or several Deflate-compressed files that are organized and
annotated such that a SOZip-aware reader can perform very fast
random access (seek) within a compressed file.<br>
<br>
SOZip makes it possible to access large compressed files directly
from a .zip file without prior decompression. It is not a new file
format, but a profile of the existing ZIP format, done in a fully
backward compatible way. ZIP readers that are non-SOZip aware can
read a SOZip-enabled file normally and ignore the extended
features that support efficient seek capability.<br>
<br>
Use cases<br>
--------------<br>
<br>
The SOZip specification is intended to be general purpose / not
domain specific. It was first developed to serve geospatial use
cases, which commonly have large compressed files inside of ZIP
archives. In particular, it makes it possible for users to read
large GIS files using the Shapefile, GeoPackage or FlatGeobuf
formats (which have no native provision for compression)
compressed in .zip files without prior decompression.<br>
<br>
Efficient random access and selective decompression are a
requirement to provide acceptable performance in many usage
scenarios: spatial index filtering, access to a feature by its
identifier, etc.<br>
<br>
Performance<br>
------------------<br>
<br>
SOZip is efficient:<br>
<br>
* The overhead of using a file from a SOZip archive, compared to
using it uncompressed, is of the order of 10% for common read
operations.<br>
* Generation of a SOZip file can be much faster than regular ZIP
generation when using multithreading.<br>
* SOZip files are typically only ~ 5% larger than regular ZIPs
(dependent on content, and chunk size)<br>
<br>
Have a look at benchmarking results:
<a class="moz-txt-link-freetext" href="https://github.com/sozip/sozip-spec/blob/master/README.md#benchmarking">https://github.com/sozip/sozip-spec/blob/master/README.md#benchmarking</a><br>
<br>
Other ZIP related specification<br>
------------------------------------------<br>
<br>
The SOZip GitHub organization also hosts the KeyValuePairs
extra-field specification (
<a class="moz-txt-link-freetext" href="https://github.com/sozip/keyvaluepairs-spec/blob/master/zip_keyvalue_extra_field_specification.md">https://github.com/sozip/keyvaluepairs-spec/blob/master/zip_keyvalue_extra_field_specification.md</a>
), to be able to encode arbitrary key-value pairs of metadata
associated with a file within a ZIP. For example to store the
Content-Type of a file.<br>
<br>
How does this relate to GDAL ?<br>
-------------------------------------------<br>
<br>
Pull request <a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/pull/7042">https://github.com/OSGeo/gdal/pull/7042</a> has been
submitted with the following enhancements:<br>
<br>
* The /vsizip/ virtual file system uses the SOZip index to
perform fast<br>
random access within a compressed SOZip-enabled file.<br>
<br>
* The Shapefile and GPKG drivers can directly generate
SOZip-enabled .shz/.shp.zip or .gpkg.zip files.<br>
<br>
* Addition of the CPLAddFileInZip() C function that can compress
a file and add<br>
it to an new or existing ZIP file, and enable the SOZip
optimization when relevant.<br>
<br>
* The existed VSIGetFileMetadata() method can be called on a
filename of<br>
the form /vsizip/path/to/the/file.zip/path/inside/the/zip/file
and<br>
with domain = "ZIP" to get information if a SOZip index is
available for that file.<br>
<br>
* The sozip
(<a class="moz-txt-link-freetext" href="https://github.com/rouault/gdal/blob/sozip/doc/source/programs/sozip.rst">https://github.com/rouault/gdal/blob/sozip/doc/source/programs/sozip.rst</a>)
new command line utility<br>
can be used to create a seek-optimized ZIP file, to append
files to an existing ZIP file, list the<br>
contents of a ZIP file and display the SOZip optimization
status or validate a SOZip file.<br>
<br>
Best regards,<br>
<br>
Even<br>
<br>
<pre class="moz-signature">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
</div>
</body>
</html>