[gdal-dev] Using ADBC to read geometries
Michael Smith
michael.smith.erdc at gmail.com
Wed Dec 25 05:11:04 PST 2024
As a followup, I was able to get this working using gdal-master build from conda (which is quite cool):
ogrinfo -ro -oo PRELUDE_STATEMENTS="LOAD SPATIAL" -oo PRELUDE_STATEMENTS="LOAD PARQUET" ADBC:'overture-places.parquet' -sql "select st_astext(geometry), * from \"overture-places\" where st_dwithin_spheroid(geometry, ST_POINT( -72.1440, 43.6406 ), 500)=true and bbox.xmin BETWEEN -73 AND -72 AND bbox.ymin BETWEEN 43 AND 44"
using duckdb sql to query parquet.
I find that I have to have a dummy local parquet file and then I can query remote datasets just fine:
ogrinfo -ro -sql "select *, st_astext(geometry) geom from read_parquet(\"s3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/*\", filename=true, hive_partitioning=1) where st_dwithin_spheroid(geometry,ST_POINT( -72.1440, 43.6406 ), 500)=true and bbox.xmin BETWEEN -73 AND -72 AND bbox.ymin BETWEEN 43 AND 44" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="load spatial" -oo PRELUDE_STATEMENTS="load parquet" ADBC:~/dummy.parquet
Mike
--
Michael Smith
Remote Sensing/GIS Center
US Army Corps of Engineers
On 12/22/24, 10:05 AM, "Even Rouault" <even.rouault at spatialys.com <mailto:even.rouault at spatialys.com>> wrote:
Hi Michael,
I've also noticed that the ADBC / Arrow interface of libduckdb seems to
be less efficient than their native API. I've no idea whether this is
for a fundamental cause or if it is "just" an implementation issue that
could be improved (on their side).
In particular I had the impression that getting an arrow stream for
"SELECT * FROM 'the_filename'", as used internally by the driver, seemed
to trigger the whole file to be ingested. Or maybe just the first row
group, but that might already be too much.
To be noted too that the driver itself asks for Arrow streams a couple
of times when geometries are detected, because it rewrites the SQL to
use ST_AsWKB() on the geometry columns, otherwise when the spatial
extension is loaded, it returns geometries encoded with their own
geometry encoding, and I didn't bother writing a parser for this custom
encoding (ADBC support in GDAL is a unsponsored effort)
So perhaps to get the most of duckdb, a dedicated driver should be written.
Regarding the lack of geometry for your use case, I'm not sure what the
cause is. I believe that duckdb_spatial is a bit stricter / less lax
than the OGR GeoParquet driver to recognize GeoParquet. At least older
versions of OvertureMaps were loosely compliant with GeoParquet.
With https://github.com/OSGeo/gdal/pull/11536 <https://github.com/OSGeo/gdal/pull/11536> applies, the following
works (although much slower than we'd indeed like it to run)
$ ogrinfo ADBC: -oo SQL="SELECT * FROM
's3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd.parquet'
LIMIT 1" -al
INFO: Open of `ADBC:'
using driver `ADBC' successful.
Layer name: part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd
Geometry: Point
Feature Count: 1
Extent: (-179.999992, -84.996332) - (-0.001674, 44.999998)
Layer SRS WKT:
GEOGCRS["WGS 84",
[ ... snip ... ]
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Geometry Column = geometry
id: String (0.0)
[ ... snip ... ]
type: String (0.0)
OGRFeature(part-00000-9b3cb01a-46a1-4378-9e77-baca19283b5a-c000.zstd):0
id (String) = 08ff39bac830c5900361ff7fe23acab8
version (Integer) = 0
sources (String(JSON)) =
[{"property":"","dataset":"meta","record_id":"1150855701606590","update_time":"2024-09-10T00:00:00.000Z","confidence":null}]
names.primary (String) = KK Beauty Shop 2
categories.primary (String) = shopping
categories.alternate (StringList) = (1:cosmetic_and_beauty_supplies)
confidence (Real) = 0.265179677819083
websites (StringList) = (null)
socials (StringList) = (1:https://www.facebook.com/1150855701606590 <https://www.facebook.com/1150855701606590>)
emails (StringList) = (null)
phones (StringList) = (1:+959765858258)
brand.wikidata (String) = (null)
brand.names.primary (String) = (null)
addresses (String(JSON)) =
[{"freeform":"အမှတ်(၂၁),ပွဲစားလမ်း(အောက်လမ်း)၊
ကြည့်မြင်တိုင်","locality":"Yangon","postcode":"11101","region":null,"country":"MM"}]
theme (String) = places
type (String) = place
POINT (-179.13203 -84.5792175)
Even
Le 21/12/2024 à 21:39, Michael Smith via gdal-dev a écrit :
> Using gdal-master conda packages, trying to use the new ADBC driver for libduckdb integration, I’m able to connect to a parquet dataset (only if it has the parquet extension) but the geometry is not being recognized.
> Seems to take a long time to load compared with duckdb. So, I must be doing something wrong.
> Note private s3 bucket.
>
>
> CPL_DEBUG=on ogrinfo ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE S3,PROVIDER CREDENTIAL_CHAIN)"
> GDAL: On-demand registering /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib using RegisterOGRADBC.
> GDAL: GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x13a70a000) succeeds as ADBC.
> INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet'
> using driver `ADBC' successful.
> OGR: GetLayerCount() = 1
>
> 1: overture-places (None)
> GDAL: GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x13a70a000)
> GDAL: In GDALDestroy - unloading GDAL shared library.
>
>
> time CPL_DEBUG=on ogrinfo ADBC:"s3://private-bucket/overture-base/overture-places.parquet" -oo ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="INSTALL spatial" -oo PRELUDE_STATEMENTS="load spatial" -oo PRELUDE_STATEMENTS="INSTALL httpfs" -oo PRELUDE_STATEMENTS="load httpfs" -oo PRELUDE_STATEMENTS="INSTALL parquet" -oo PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="install aws" -oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="CREATE SECRET ( TYPE S3,PROVIDER CREDENTIAL_CHAIN)"
> GDAL: On-demand registering /Users/rdcrlmds/mambaforge/envs/gdalmaster/lib/gdalplugins/ogr_ADBC.dylib using RegisterOGRADBC.
> GDAL: GDALOpen(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x129e15350) succeeds as ADBC.
> INFO: Open of `ADBC:s3://private-bucket/overture-base/overture-places.parquet'
> using driver `ADBC' successful.
> OGR: GetLayerCount() = 1
>
> 1: overture-places (None)
> GDAL: GDALClose(ADBC:s3://private-bucket/overture-base/overture-places.parquet, this=0x129e15350)
> GDAL: In GDALDestroy - unloading GDAL shared library.
> CPL_DEBUG=on ogrinfo -oo ADBC_DRIVER=libduckdb -oo -oo -oo -oo -oo -oo
> 90.25s user 22.43s system 41% cpu 4:29.75 total
>
>
--
http://www.spatialys.com <http://www.spatialys.com>
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.
Mood of the day: "Bien entendu, on peut sauter sur sa chaise comme un cabri en disant : les standards ! les standards ! les standards ! Mais ça n’aboutit à rien et ça ne signifie rien." ~ dixit De Gaulle
More information about the gdal-dev
mailing list