[gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Even Rouault even.rouault at spatialys.com
Sun Jan 18 06:50:04 PST 2026


Ari,

> I need to read from a large Parquet file (10-20 GB, in S3) features 
> using a set of user defined constraints that I can parse into 
> non-spatial SQL and polygon masks. My tests so far show good 
> performance with a single non-spatial constraint and (separately) with 
> a bbox. 

Do you mean you get bad performance when setting both 
SetAttributeFilter() and SetSpatialFilter[Rect]() ? I cannot explain 
that. Combining them should not be less performant.

You don't mention if your geoparquet files have a covering bounding box 
column. For the default WKB encoding, this is essential to avoid full 
scan of the file.

> However, I not sure how to go forward with mixing non-spatial 
> constraints and perhaps multiple arbitrary polygons (which may be 
> non-adjacent).
If you have something like attr_filter && (Intersects(geom, poly1) || 
Intersects(geom, poly2))  , then you should do separately  attr_filter 
&& Intersects(geom, poly1)   and then attr_filter && Intersects(geom, 
poly2)
>
>  GDAL SQL docs tell me that with Spatialite built-in I could use 
> ST_Intersects but does that help with Parquet files? 
No, because that wouldn't translate as a SetSpatialFilter[Rect]() 
request, and thus you would get full scan of the file
> How about constructing the non-spatial SQL query first, use that on 
> dataset, and then use SetSpatialFilterRect on the resulting layer 
> object possibly multiple times plus ogr.Geometry.Intersects on each 
> feature coming from the obtained layer? My intuition would tell me to 
> first do the spatial filtering as that (may) narrow down the search 
> considerably. But then I cannot use the non-spatial SQL as that 
> requires a dataset to be executed on.

You could store the result of the spatial request in a temporary dataset 
(possibly in memory) and then apply the attribute filter. But as said 
above, I'm a bit surprised that combining the attribute filter and a 
(single geometry) spatial filter isn't efficient.

Instead of the Parquet driver, you may also try with duckdb and the ADBC 
driver. The duckdb SQL engine generally outperforms libarrow/libparquet.

Even

-- 
http://www.spatialys.com
My software is free, but my time generally not.



More information about the gdal-dev mailing list