[gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters
Michael Smith
michael.smith.erdc at gmail.com
Fri Jan 23 06:14:12 PST 2026
When you access by ID, are you also specifying the spatial/attribute filters you used previously? Keeping those in, especially the spatial filter, will make it faster.
Mike
--
Michael Smith
RSGIS Center – ERDC CRREL NH
US Army Corps
On 1/23/26, 9:09 AM, "Even Rouault" <even.rouault at spatialys.com <mailto:even.rouault at spatialys.com>> wrote:
Le 23/01/2026 à 08:32, Ari Jolma a écrit :
> Thanks Even,
>
> Attribute filter fid = <fid> seems fast but ID = <ID> is not fast.
Probably because your features appear in random ID order, and thus the
target ID you're looking for is in the range of ID values of most row
groups, and thus lead to loading everything (whereas the OGR "fid" is
sequential and thus it is easy to load only the row group containing
it). You could try using the "PARQUET:your.parquet" connection string
that will go through the arrow dataset API whose filtering logic
possibly uses page statistics (which the "your.parquet" file syntax
don't do), but this might not help a lot. There's no indices in Parquet.
--
http://www.spatialys.com <http://www.spatialys.com>
My software is free, but my time generally not.
More information about the gdal-dev
mailing list