[gdal-dev] Geometry box columns and ADBC vs PARQUET
Even Rouault
even.rouault at spatialys.com
Mon Jul 28 06:01:35 PDT 2025
Michael,
The spatial filter passed to the PARQUET driver is propagated to
libarrow (cf
https://github.com/OSGeo/gdal/blob/f01883b9c84e402dedc756e8b85c613c56e5904b/ogr/ogrsf_frmts/parquet/ogrparquetdatasetlayer.cpp#L557),
so I assume this is a matter of (in)efficiency difference between
libarrrow and libduckdb
Even
Le 27/07/2025 à 23:29, Michael Smith via gdal-dev a écrit :
> Is there a reason that the geometry bboxes are not exposed via the PARQUET driver but are via ADBC?
>
> ADBC:
> Geometry Column = geometry
> geometry_bbox.xmin: Real(Float32) (0.0)
> geometry_bbox.ymin: Real(Float32) (0.0)
> geometry_bbox.xmax: Real(Float32) (0.0)
> geometry_bbox.ymax: Real(Float32) (0.0)
>
> PARQUET:
> Geometry Column = geometry
>
> Adding bbox attribute filtering in addition to spatial filtering makes queries much faster:
>
> gf = gdal.OpenEx("PARQUET:/vsis3/bucket/stac/mds/rasters/")
> layer = gf.GetLayer()
> layer.SetSpatialFilter(ogr.CreateGeometryFromWkb(aoi.clip_geometry.wkb))
> %time feats = [feat for feat in layer]
> CPU times: user 1.74 s, sys: 3.44 s, total: 5.18 s
> Wall time: 5.35 s
>
>
> gf = gdal.OpenEx("ADBC:", open_options=['ADBC_DRIVER=libduckdb', 'PRELUDE_STATEMENTS=LOAD SPATIAL', 'PRELUDE_STATEMENTS=load httpfs', 'PRELUDE_STATEMENTS=load aws', 'PRELUDE_STATEMENTS=CREATE SECRET (TYPE S3,PROVIDER CREDENTIAL_CHAIN)'])
> %time layer = gf.ExecuteSQL(f"select * from read_parquet('s3://grid-dev-publiclidar/stac/mds/rasters/*') where st_intersects(geometry, ST_GeomFromText('{aoi.clip_geometry.wkt}')) ⋮ and bbox.xmin between -69 and -64 and bbox.ymin between 17 and 19")
> CPU times: user 898 ms, sys: 154 ms, total: 1.05 s
> Wall time: 730 ms
> %time feats = [feat for feat in layer]
> CPU times: user 192 ms, sys: 43.4 ms, total: 235 ms
> Wall time: 237 ms
>
>
--
http://www.spatialys.com
My software is free, but my time generally not.
More information about the gdal-dev
mailing list