[gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Ari Jolma ari.jolma at gmail.com
Sun Jan 18 01:11:23 PST 2026


Hi all,

I need to read from a large Parquet file (10-20 GB, in S3) features 
using a set of user defined constraints that I can parse into 
non-spatial SQL and polygon masks. My tests so far show good performance 
with a single non-spatial constraint and (separately) with a bbox. 
However, I not sure how to go forward with mixing non-spatial 
constraints and perhaps multiple arbitrary polygons (which may be 
non-adjacent).

  GDAL SQL docs tell me that with Spatialite built-in I could use 
ST_Intersects but does that help with Parquet files? How about 
constructing the non-spatial SQL query first, use that on dataset, and 
then use SetSpatialFilterRect on the resulting layer object possibly 
multiple times plus ogr.Geometry.Intersects on each feature coming from 
the obtained layer? My intuition would tell me to first do the spatial 
filtering as that (may) narrow down the search considerably. But then I 
cannot use the non-spatial SQL as that requires a dataset to be executed on.

The user is actually warned against leaving out the spatial filter as 
the Parquet files contain millions of features and the selection is any 
way truncated to max few hundred features.

Any ideas?

Ari




More information about the gdal-dev mailing list