[gdal-dev] Getting summary data from parquet with sql
Michael Smith
michael.smith.erdc at gmail.com
Tue Feb 17 02:48:14 PST 2026
I wanted to get a sum of the value of a column using a spatial filter on a parquet file. I can easily do this with duckdb but I was trying via gdal.
I was able to do it via fetching features but was unable to do it just with executeSQL as the spatialfilter part wouldn’t find the geometry column unless it was part of the query
This worked:
gf = gdal.OpenEx(f'PARQUET:{parquet_file')
lay = gf.GetLayer()
lay.SetSpatialFilter(ogr.CreateGeometryFromWkb(aoi.wkb))
totsize_bytes += sum([feat.GetFieldAsInteger64('file_size') for feat in lay])
This didn’t:
res = gf.ExecuteSQL('select sum(file_size) from "parquet-file"', ogr.CreateGeometryFromWkb(aoi.wkb))
RuntimeError: Cannot set spatial filter: no geometry field present in layer.
Is this just a limitation of OGR SQL?
Via duckdb:
wkb_bytes = aoi.wkb.tobytes()
sql = f"select sum(file_size) from read_parquet('{str(parquet-file)}') where ST_Intersects_Extent(geometry, ST_GeomFromWKB(?))"
params = [wkb_bytes]
Performance difference:
gdal: size: 1139758617, time: 0:01:37.471977
duck: size: 1139758617, time: 0:00:15.171584
--
Michael Smith
RSGIS Center – ERDC CRREL NH
US Army Corps
More information about the gdal-dev
mailing list