[gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Even Rouault even.rouault at spatialys.com
Thu Jan 22 12:09:35 PST 2026


Hi Ari,

Looking at the code, I see the driver does read all row groups whereas 
it could potentially be improved to use row group level statistics to 
skip all of them but the one matching. That said you can probably 
workaround the issue by using instead SetAttributeFilter("fid = 
<the-fid>")    , or querying directly the ID column if that's your 
ultimate objective.

More generally Parquet shines more at requesting a significant amount of 
data / bulk loading scenarios than just extracting a single feature 
where you'll get better performance with regular databases with proper 
indices built.

Even

Le 22/01/2026 à 12:49, Ari Jolma a écrit :
> Thanks for the replies. I'm progressing but now I hit something I 
> don't understand.
>
> I have a large GPKG file which I converted into a Parquet file. If I 
> now do a simple layer.GetFeature(fid) on a random fid on the layer, 
> the feature is retrieved from GPKG really fast (also if the file is in 
> S3) but from Parquet it is slow (~ 20 secs) even on local filesystem.
>
> On both files layer.GetFIDColumn() reports "fid". There is a native 
> "ID" column on the GPKG but fid <> ID.
>
> I used ogr2ogr to create the Parquet file. I had -lco COMPRESSION=None
>
> Ari
>
> Michael Smith kirjoitti 18.1.2026 klo 18.09:
>> I combine attribute and spatial filters a lot on large parquet files 
>> using a combination of SetSpatialFilter() and SetAttributeFilter() 
>> before querying. I've only had some issues with partition elimination 
>> which have now been fixed. Sometimes the ADBC connection can be 
>> faster to query but opening the file with gdal.OpenEx() is slower. 
>> And ADBC takes more memory. I find the gdal query method generally 
>> better.
>>
>> Having access to the sql functions of duckdb is the only reason I 
>> ever use ADBC.
>>
>> Mike
>>
>>
-- 
http://www.spatialys.com
My software is free, but my time generally not.



More information about the gdal-dev mailing list