[gdal-dev] Errors while creating sozip zarr file
Erik Schnetter
schnetter at gmail.com
Thu Jul 24 07:20:39 PDT 2025
Thanks for the pointer. I opened an issue.
-erik
> On Jul 21, 2025, at 10:17, Even Rouault <even.rouault at spatialys.com> wrote:
>
> Erik,
>
> I don't think it is really worth sozip'ing a zipped Zarr, given that zarr is made of many relatively small files, and sozip shines with big compressed files. Generally, even when creating a zipped (sozip or not) Zarr file, you need to make sure that your writing pattern matches chunks boundaries, to avoid chunk files to be rewritten several times and making the zip bigger than needed. Please file an issue about the error not being transmitted up to the caller
>
> Even
>
> Le 19/07/2025 à 17:44, Erik Schnetter via gdal-dev a écrit :
>> I am using GDAL to create a multidimensional zarr file that is sozip compressed. I see this error when creating the file:
>>
>> ERROR 1: dish_positions.00000000.zarr/zarr.json already exists in ZIP file
>> ERROR 8: Open file /vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr/zarr.json to write failed
>>
>> Everything is working fine when I do not use sozip compression. I enable sozip compression by adding a "/vsizip" prefix to the file name. Although there is an error reported on screen, I do not see an error code reported by the function creating or closing the multidimensional dataset. The resulting file ("*.zarr.zip") is created fine and looks almost correct, but all attributes seem to be missing.
>>
>> I wonder – is it actually possible to create a zarr file that is sozip compressed, given that zarr probably writes to each of its file multiple times? If not, what is the preferred way to create a sozip-compressed zarr file efficiently?
>>
>> Some details:
>>
>> I create the dataset (i.e. the file) via
>>
>> const auto driver_manager = GetGDALDriverManager();
>> const auto driver = driver_manager->GetDriverByName("Zarr");
>> const auto dataset = std::unique_ptr<GDALDataset>(driver->CreateMultiDimensional(
>> full_path.c_str(), root_group_options_c.data(), options_c.data()));
>>
>> where "full_path" is "/vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr".
>>
>> I then create multiple attributes ("CreateAttribute") and then
>>
>> const auto mdarray = group->CreateMDArray(meta->get_name(), dimensions, datatype,
>> array_options_c.data());
>> const bool success = mdarray->Write(
>> arrayStart.data(), count.data(), nullptr, bufferStride.data(), datatype,
>> frame + datatypesize * meta->offset, frame, buffer->frame_size);
>>
>> and finish with
>>
>> const CPLErr err = dataset->Close();
>> assert(!err);
>>
>> The full code is available at <https://github.com/kotekan/kotekan/blob/eschnett/updates-2/lib/stages/gdalFileWrite.cpp>.
>>
>> -erik
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
More information about the gdal-dev
mailing list