[gdal-dev] gdal parquet and hive partitioning

Even Rouault even.rouault at spatialys.com
Sun Dec 28 07:48:45 PST 2025


Hi Mike,

the problem is likely two folds:

- "gdal vector partition" doesn't write the "_metadata" file that 
contains the schema and the path to the actual .parquet files

- but even if it did, I cannot manage to convince libarrow/libparquet to 
not probe all files. Not sure if I'm missing something in the API or if 
that's a fundamental limitation of the library. I've filed 
https://github.com/apache/arrow/issues/48671 about that.  I've 
considered implementing a workaround on GDAL side but I couldn't come 
with anything.

Your best workaround is to directly access 
"/vsis3/bucket/overture/20251217/overture-buildings/country=US"

Even

Le 28/12/2025 à 13:26, Michael Smith via gdal-dev a écrit :
> I know that gdal can write parquet data with hive partitioning using gdal vector partition, but after doing so, can gdal do the partition elimination on reading when a where/attribute is specified on the partition key?
>
> I was trying to do a pipeline with:
> gdal vector pipeline !  read  "/vsis3/bucket/overture/20251217/overture-buildings/” ! filter  --bbox -117.486117584442,33.9156194185775,-117.333055544584,33.9745995301481 --where "country='US'" ! write -f parquet /tmp/test1.parquet --progress --overwrite
>
> but in CPL_DEBUG I see it scanning all the partitions rather than just querying the country=US partition.
>
> S3: Downloading 0-1605631 (https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAI/data_0.parquet)...
> S3: Got response_code=206
> S3: Downloading 0-16383999 (https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet)...
> S3: Got response_code=206
> S3: Downloading 0-16383999 (https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet)...
> S3: Got response_code=206
> S3: Downloading 16384000-32767999 (https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet)...
> S3: Got response_code=206
> S3: Downloading 16384000-29741378 (https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet)...
> ....
>
>
>
-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20251228/389c87a9/attachment.htm>


More information about the gdal-dev mailing list