[gdal-dev] Using ADBC to read geometries

Michael Smith michael.smith.erdc at gmail.com
Wed Dec 25 05:11:04 PST 2024


As a followup, I was able to get this working using gdal-master build from conda (which is quite cool): 

ogrinfo -ro -oo PRELUDE_STATEMENTS="LOAD SPATIAL" -oo PRELUDE_STATEMENTS="LOAD PARQUET" ADBC:'overture-places.parquet' -sql "select st_astext(geometry), * from \"overture-places\" where st_dwithin_spheroid(geometry, ST_POINT( -72.1440, 43.6406 ), 500)=true and bbox.xmin BETWEEN -73 AND -72 AND bbox.ymin BETWEEN 43 AND 44"

using duckdb sql to query parquet. 

I find that I have to have a dummy local parquet file and then I can query remote datasets just fine:

ogrinfo -ro -sql "select *, st_astext(geometry) geom from read_parquet(\"s3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/*\", filename=true, hive_partitioning=1)  where st_dwithin_spheroid(geometry,ST_POINT( -72.1440, 43.6406 ), 500)=true and bbox.xmin BETWEEN -73 AND -72 AND bbox.ymin BETWEEN 43 AND 44" -oo PRELUDE_STATEMENTS="load httpfs"  -oo PRELUDE_STATEMENTS="load spatial"  -oo PRELUDE_STATEMENTS="load parquet"  ADBC:~/dummy.parquet

Mike


-- 

Michael Smith 
Remote Sensing/GIS Center 
US Army Corps of Engineers 





On 12/22/24, 10:05 AM, "Even Rouault" <even.rouault at spatialys.com <mailto:even.rouault at spatialys.com>> wrote:


Hi Michael,


I've also noticed that the ADBC / Arrow interface of libduckdb seems to 
be less efficient than their native API. I've no idea whether this is 
for a fundamental cause or if it is "just" an implementation issue that 
could be improved (on their side).


In particular I had the impression that getting an arrow stream for 
"SELECT * FROM 'the_filename'", as used internally by the driver, seemed 
to trigger the whole file to be ingested. Or maybe just the first row 
group, but that might already be too much.


To be noted too that the driver itself asks for Arrow streams a couple 
of times when geometries are detected, because it rewrites the SQL to 
use ST_AsWKB() on the geometry columns, otherwise when the spatial 
extension is loaded, it returns geometries encoded with their own 
geometry encoding, and I didn't bother writing a parser for this custom 
encoding (ADBC support in GDAL is a unsponsored effort)


So perhaps to get the most of duckdb, a dedicated driver should be written.


Regarding the lack of geometry for your use case, I'm not sure what the 
cause is. I believe that duckdb_spatial is a bit stricter / less lax 
than the OGR GeoParquet driver to recognize GeoParquet. At least older 
versions of OvertureMaps were loosely compliant with GeoParquet.


With https://github.com/OSGeo/gdal/pull/11536 <https://github.com/OSGeo/gdal/pull/11536> applies, the following 
works (although much slower than we'd indeed like it to run)


$ ogrinfo ADBC: -oo SQL="SELECT * FROM 
's3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd.parquet' 
LIMIT 1" -al


INFO: Open of `ADBC:'
using driver `ADBC' successful.


Layer name: part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd
Geometry: Point
Feature Count: 1
Extent: (-179.999992, -84.996332) - (-0.001674, 44.999998)
Layer SRS WKT:
GEOGCRS["WGS 84",
[ ... snip ... ]
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Geometry Column = geometry
id: String (0.0)
[ ... snip ... ]
type: String (0.0)
OGRFeature(part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd):0
id (String) = 08ff39bac830c5900361ff7fe23acab8
version (Integer) = 0
sources (String(JSON)) = 
[{"property":"","dataset":"meta","record_id":"1150855701606590","update_time":"2024-09-10T00:00:00.000Z","confidence":null}]
names.primary (String) = KK Beauty Shop 2
categories.primary (String) = shopping
categories.alternate (StringList) = (1:cosmetic_and_beauty_supplies)
confidence (Real) = 0.265179677819083
websites (StringList) = (null)
socials (StringList) = (1:https://www.facebook.com/1150855701606590 <https://www.facebook.com/1150855701606590>)
emails (StringList) = (null)
phones (StringList) = (1:+959765858258)
brand.wikidata (String) = (null)
brand.names.primary (String) = (null)
addresses (String(JSON)) = 
[{"freeform":"အမှတ်(၂၁),ပွဲစားလမ်း(အောက်လမ်း)၊ 
ကြည့်မြင်တိုင်","locality":"Yangon","postcode":"11101","region":null,"country":"MM"}]
theme (String) = places
type (String) = place
POINT (-179.13203 -84.5792175)


Even


Le 21/12/2024 à 21:39, Michael Smith via gdal-dev a écrit :
> Using gdal-master conda packages, trying to use the new ADBC driver for libduckdb integration, I’m able to connect to a parquet dataset (only if it has the parquet extension) but the geometry is not being recognized.
> Seems to take a long time to load compared with duckdb. So, I must be doing something wrong.
> Note private s3 bucket.
>
>
> CPL_DEBUG=on ogrinfo ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE S3,PROVIDER CREDENTIAL_CHAIN)"
> GDAL: On-demand registering /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib using RegisterOGRADBC.
> GDAL: GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x13a70a000) succeeds as ADBC.
> INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet'
> using driver `ADBC' successful.
> OGR: GetLayerCount() = 1
>
> 1: overture-places (None)
> GDAL: GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x13a70a000)
> GDAL: In GDALDestroy - unloading GDAL shared library.
>
>
> time CPL_DEBUG=on ogrinfo ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL spatial" -oo PRELUDE_STATEMENTS="load spatial" -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE S3,PROVIDER CREDENTIAL_CHAIN)"
> GDAL: On-demand registering /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib using RegisterOGRADBC.
> GDAL: GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x129e15350) succeeds as ADBC.
> INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet'
> using driver `ADBC' successful.
> OGR: GetLayerCount() = 1
>
> 1: overture-places (None)
> GDAL: GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x129e15350)
> GDAL: In GDALDestroy - unloading GDAL shared library.
> CPL_DEBUG=on ogrinfo -oo ADBC_DRIVER=libduckdb -oo -oo -oo -oo -oo -oo
> 90.25s user 22.43s system 41% cpu 4:29.75 total
>
>
-- 
http://www.spatialys.com <http://www.spatialys.com>
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.
Mood of the day: "Bien entendu, on peut sauter sur sa chaise comme un cabri en disant : les standards ! les standards ! les standards ! Mais ça n’aboutit à rien et ça ne signifie rien." ~ dixit De Gaulle








More information about the gdal-dev mailing list