[gdal-dev] Using ADBC to read geometries

Even Rouault even.rouault at spatialys.com
Sun Dec 22 07:05:34 PST 2024


Hi Michael,

I've also noticed that the ADBC / Arrow interface of libduckdb seems to 
be less efficient than their native API. I've no idea whether this is 
for a fundamental cause or if it is "just" an implementation issue that 
could be improved (on their side).

In particular I had the impression that getting an arrow stream for 
"SELECT * FROM 'the_filename'", as used internally by the driver, seemed 
to trigger the whole file to be ingested. Or maybe just the first row 
group, but that might already be too much.

To be noted too that the driver itself asks for Arrow streams a couple 
of times when geometries are detected, because it rewrites the SQL to 
use ST_AsWKB() on the geometry columns, otherwise when the spatial 
extension is loaded, it returns geometries encoded with their own 
geometry encoding, and I didn't bother writing a parser for this custom 
encoding (ADBC support in GDAL is a unsponsored effort)

So perhaps to get the most of duckdb, a dedicated driver should be written.

Regarding the lack of geometry for your use case, I'm not sure what the 
cause is. I believe that duckdb_spatial is a bit stricter / less lax 
than the OGR GeoParquet driver to recognize GeoParquet. At least older 
versions of OvertureMaps were loosely compliant with GeoParquet.

With https://github.com/OSGeo/gdal/pull/11536 applies, the following 
works (although much slower than we'd indeed like it to run)

$ ogrinfo ADBC: -oo SQL="SELECT * FROM 
's3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd.parquet' 
LIMIT 1" -al

INFO: Open of `ADBC:'
       using driver `ADBC' successful.

Layer name: part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd
Geometry: Point
Feature Count: 1
Extent: (-179.999992, -84.996332) - (-0.001674, 44.999998)
Layer SRS WKT:
GEOGCRS["WGS 84",
[ ... snip ... ]
     ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Geometry Column = geometry
id: String (0.0)
[ ... snip ... ]
type: String (0.0)
OGRFeature(part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd):0
   id (String) = 08ff39bac830c5900361ff7fe23acab8
   version (Integer) = 0
   sources (String(JSON)) = 
[{"property":"","dataset":"meta","record_id":"1150855701606590","update_time":"2024-09-10T00:00:00.000Z","confidence":null}]
   names.primary (String) = KK Beauty Shop 2
   categories.primary (String) = shopping
   categories.alternate (StringList) = (1:cosmetic_and_beauty_supplies)
   confidence (Real) = 0.265179677819083
   websites (StringList) = (null)
   socials (StringList) = (1:https://www.facebook.com/1150855701606590)
   emails (StringList) = (null)
   phones (StringList) = (1:+959765858258)
   brand.wikidata (String) = (null)
   brand.names.primary (String) = (null)
   addresses (String(JSON)) = 
[{"freeform":"အမှတ်(၂၁),ပွဲစားလမ်း(အောက်လမ်း)၊ 
ကြည့်မြင်တိုင်","locality":"Yangon","postcode":"11101","region":null,"country":"MM"}]
   theme (String) = places
   type (String) = place
   POINT (-179.13203 -84.5792175)

Even

Le 21/12/2024 à 21:39, Michael Smith via gdal-dev a écrit :
> Using gdal-master conda packages, trying to use the new ADBC driver for libduckdb integration, I’m able to connect to a parquet dataset (only if it has the parquet extension) but the geometry is not being recognized.
> Seems to take a long time to load compared with duckdb. So, I must be doing something wrong.
> Note private s3 bucket.
>
>
> CPL_DEBUG=on ogrinfo ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE S3,PROVIDER CREDENTIAL_CHAIN)"
> GDAL: On-demand registering /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib using RegisterOGRADBC.
> GDAL: GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x13a70a000) succeeds as ADBC.
> INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet'
>        using driver `ADBC' successful.
> OGR: GetLayerCount() = 1
>
> 1: overture-places (None)
> GDAL: GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x13a70a000)
> GDAL: In GDALDestroy - unloading GDAL shared library.
>
>
> time CPL_DEBUG=on ogrinfo ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo ADBC_DRIVER=libduckdb  -oo PRELUDE_STATEMENTS="INSTALL spatial" -oo PRELUDE_STATEMENTS="load spatial" -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE S3,PROVIDER CREDENTIAL_CHAIN)"
> GDAL: On-demand registering /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib using RegisterOGRADBC.
> GDAL: GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x129e15350) succeeds as ADBC.
> INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet'
>        using driver `ADBC' successful.
> OGR: GetLayerCount() = 1
>
> 1: overture-places (None)
> GDAL: GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x129e15350)
> GDAL: In GDALDestroy - unloading GDAL shared library.
> CPL_DEBUG=on ogrinfo  -oo ADBC_DRIVER=libduckdb -oo  -oo  -oo  -oo  -oo  -oo
> 90.25s user 22.43s system 41% cpu 4:29.75 total
>
>
-- 
http://www.spatialys.com
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.
Mood of the day: "Bien entendu, on peut sauter sur sa chaise comme un cabri en disant : les standards ! les standards ! les standards ! Mais ça n’aboutit à rien et ça ne signifie rien." ~ dixit De Gaulle



More information about the gdal-dev mailing list