[gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Ari Jolma ari.jolma at gmail.com
Sun Jan 18 08:01:44 PST 2026


Even Rouault kirjoitti 18.1.2026 klo 16.50:

> Ari,
>
>> I need to read from a large Parquet file (10-20 GB, in S3) features 
>> using a set of user defined constraints that I can parse into 
>> non-spatial SQL and polygon masks. My tests so far show good 
>> performance with a single non-spatial constraint and (separately) 
>> with a bbox. 
>
> Do you mean you get bad performance when setting both 
> SetAttributeFilter() and SetSpatialFilter[Rect]() ? I cannot explain 
> that. Combining them should not be less performant.


No, I'm, just looking for how to best mix spatial and non-spatial 
filters/constraints when retrieving features from a Paquet file using GDAL.


>
> You don't mention if your geoparquet files have a covering bounding 
> box column. For the default WKB encoding, this is essential to avoid 
> full scan of the file.


I don't know about that - will check - but the basic 
SetSpatialFilterRect on a GDAL Python layer works fine.


>
>
>> However, I not sure how to go forward with mixing non-spatial 
>> constraints and perhaps multiple arbitrary polygons (which may be 
>> non-adjacent).
> If you have something like attr_filter && (Intersects(geom, poly1) || 
> Intersects(geom, poly2))  , then you should do separately  attr_filter 
> && Intersects(geom, poly1)   and then attr_filter && Intersects(geom, 
> poly2)


Ok, so the attr_filter is not expensive even though it is applied twice.


>
>>
>>  GDAL SQL docs tell me that with Spatialite built-in I could use 
>> ST_Intersects but does that help with Parquet files? 
> No, because that wouldn't translate as a SetSpatialFilter[Rect]() 
> request, and thus you would get full scan of the file


Ok, I assumed that too.


>
>> How about constructing the non-spatial SQL query first, use that on 
>> dataset, and then use SetSpatialFilterRect on the resulting layer 
>> object possibly multiple times plus ogr.Geometry.Intersects on each 
>> feature coming from the obtained layer? My intuition would tell me to 
>> first do the spatial filtering as that (may) narrow down the search 
>> considerably. But then I cannot use the non-spatial SQL as that 
>> requires a dataset to be executed on.
>
> You could store the result of the spatial request in a temporary 
> dataset (possibly in memory) and then apply the attribute filter. But 
> as said above, I'm a bit surprised that combining the attribute filter 
> and a (single geometry) spatial filter isn't efficient.


Maybe I was not clear on that I'm at this point wondering how to best 
combine the attribute filter and the spatial filter.


>
> Instead of the Parquet driver, you may also try with duckdb and the 
> ADBC driver. The duckdb SQL engine generally outperforms 
> libarrow/libparquet.


Hm, Parquet files are given at this point - I'm doing 
consultancy/development for a client and Parquet is their choice so I 
guess I have developer role now. :)


>
> Even
>

Thanks,

Ari




More information about the gdal-dev mailing list