[gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Fri Jan 23 03:03:27 PST 2026

Hi Jukka,

Yes, GPKG works pretty well with vsis3 too but Parquet will probably be 
the choice of my client.

Ari

Rahkonen Jukka kirjoitti 23.1.2026 klo 10.15:
> Hi Ari,
>
> But don't you know the answer already? You wrote:
> "the feature is retrieved from GPKG really fast (also if the file is in S3)"
> GeoPackage is not listed as a cloud optimized format, but actually with GDAL it does work pretty well with vsicurl.
>
> -Jukka Rahkonen-
>
> ________________________________________
> Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org> käyttäjän Ari Jolma via gdal-dev <gdal-dev at lists.osgeo.org> puolesta
> Lähetetty: Perjantai 23. tammikuuta 2026 9.32
> Vastaanottaja: Even Rouault <even.rouault at spatialys.com>; Michael Smith <michael.smith.erdc at gmail.com>; gdal-dev at lists.osgeo.org <gdal-dev at lists.osgeo.org>
> Aihe: Re: [gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters
>
>
> Thanks Even,
>
> Attribute filter fid = <fid> seems fast but ID = <ID> is not fast. Hm,
> the whole idea is to use files in S3 instead of data in AWS RDS as the
> data is static and RDS costs are high (i.e., it's not a technical
> reason). Our use case is mostly about bbox searches and then extracting
> single feature or doing a simple mixed spatial and attribute search.
>
> Ari
>
> Even Rouault kirjoitti 22.1.2026 klo 22.09:
>> Hi Ari,
>>
>> Looking at the code, I see the driver does read all row groups whereas
>> it could potentially be improved to use row group level statistics to
>> skip all of them but the one matching. That said you can probably
>> workaround the issue by using instead SetAttributeFilter("fid =
>> <the-fid>")    , or querying directly the ID column if that's your
>> ultimate objective.
>>
>> More generally Parquet shines more at requesting a significant amount
>> of data / bulk loading scenarios than just extracting a single feature
>> where you'll get better performance with regular databases with proper
>> indices built.
>>
>> Even
>>
>> Le 22/01/2026 à 12:49, Ari Jolma a écrit :
>>> Thanks for the replies. I'm progressing but now I hit something I
>>> don't understand.
>>>
>>> I have a large GPKG file which I converted into a Parquet file. If I
>>> now do a simple layer.GetFeature(fid) on a random fid on the layer,
>>> the feature is retrieved from GPKG really fast (also if the file is
>>> in S3) but from Parquet it is slow (~ 20 secs) even on local filesystem.
>>>
>>> On both files layer.GetFIDColumn() reports "fid". There is a native
>>> "ID" column on the GPKG but fid <> ID.
>>>
>>> I used ogr2ogr to create the Parquet file. I had -lco COMPRESSION=None
>>>
>>> Ari
>>>
>>> Michael Smith kirjoitti 18.1.2026 klo 18.09:
>>>> I combine attribute and spatial filters a lot on large parquet files
>>>> using a combination of SetSpatialFilter() and SetAttributeFilter()
>>>> before querying. I've only had some issues with partition
>>>> elimination which have now been fixed. Sometimes the ADBC connection
>>>> can be faster to query but opening the file with gdal.OpenEx() is
>>>> slower. And ADBC takes more memory. I find the gdal query method
>>>> generally better.
>>>>
>>>> Having access to the sql functions of duckdb is the only reason I
>>>> ever use ADBC.
>>>>
>>>> Mike
>>>>
>>>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>