[gdal-dev] Errors while creating sozip zarr file

Even Rouault even.rouault at spatialys.com
Mon Jul 21 07:17:56 PDT 2025


Erik,

I don't think it is really worth sozip'ing a zipped Zarr, given that 
zarr is made of many relatively small files, and sozip shines with big 
compressed files.  Generally, even when creating a zipped (sozip or not) 
Zarr file, you need to make sure that your writing pattern matches 
chunks boundaries, to avoid chunk files to be rewritten several times 
and making the zip bigger than needed. Please file an issue about the 
error not being transmitted up to the caller

Even

Le 19/07/2025 à 17:44, Erik Schnetter via gdal-dev a écrit :
> I am using GDAL to create a multidimensional zarr file that is sozip 
> compressed. I see this error when creating the file:
>
> ERROR 1: dish_positions.00000000.zarr/zarr.json already exists in ZIP file
> ERROR 8: Open file 
> /vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr/zarr.json 
> to write failed
>
> Everything is working fine when I do not use sozip compression. I 
> enable sozip compression by adding a "/vsizip" prefix to the file 
> name. Although there is an error reported on screen, I do not see an 
> error code reported by the function creating or closing the 
> multidimensional dataset. The resulting file ("*.zarr.zip") is created 
> fine and looks almost correct, but all attributes seem to be missing.
>
> I wonder – is it actually possible to create a zarr file that is sozip 
> compressed, given that zarr probably writes to each of its file 
> multiple times? If not, what is the preferred way to create a 
> sozip-compressed zarr file efficiently?
>
> Some details:
>
> I create the dataset (i.e. the file) via
>
>                 const auto driver_manager = GetGDALDriverManager();
>                 const auto driver = 
> driver_manager->GetDriverByName("Zarr");
>                 const auto dataset = 
> std::unique_ptr<GDALDataset>(driver->CreateMultiDimensional(
>                     full_path.c_str(), root_group_options_c.data(), 
> options_c.data()));
>
> where "full_path" is 
> "/vsizip/data/fengine_init_pathfinder/cx66_dish_positions.00000000.zarr.zip/dish_positions.00000000.zarr".
>
> I then create multiple attributes ("CreateAttribute") and then
>
>                 const auto mdarray = 
> group->CreateMDArray(meta->get_name(), dimensions, datatype,
> array_options_c.data());
>                     const bool success = mdarray->Write(
>                         arrayStart.data(), count.data(), nullptr, 
> bufferStride.data(), datatype,
>                         frame + datatypesize * meta->offset, frame, 
> buffer->frame_size);
>
> and finish with
>
>                 const CPLErr err = dataset->Close();
>                 assert(!err);
>
> The full code is available at 
> <https://github.com/kotekan/kotekan/blob/eschnett/updates-2/lib/stages/gdalFileWrite.cpp>.
>
> -erik
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.



More information about the gdal-dev mailing list