[gdal-dev] DuckDB queries in GDAL/OGR master branch
Michael Smith
michael.smith.erdc at gmail.com
Fri Dec 27 10:54:51 PST 2024
I've used it with geoparquet files from s3, local geopackage files etc. Duckdb works well with geoparquet, for the other formats its using GDAL to read. You certainly can use flatgeobuf.
Yep, for just bbox queries, no need for alternative sql methods. This is more for doing more data analytics etc.
Mike
--
Michael Smith
Remote Sensing/GIS Center
US Army Corps of Engineers
On 12/27/24, 1:45 PM, "gdal-dev on behalf of Scott via gdal-dev" <gdal-dev-bounces at lists.osgeo.org <mailto:gdal-dev-bounces at lists.osgeo.org> on behalf of gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>> wrote:
Thanks for the example Michael!
Will this work with geoparquet files on an out of the box apache web
server?
Will I need to replace my simple apache server(s) with
httpfs/hdfs/hadoop or forced to use public, proprietary cloud storage?
If I *only* need to filter by bbox, FlatGeoBuf (FGB) format is a far,
far more simple cloud native solution. No changes to infrastructure
required. Store your FGB on any http(s) server.
For more complex queries, maybe an API might still be the more simple
solution.
Scott
On 12/27/24 08:05, Michael Smith via gdal-dev wrote:
> With some recent commits to GDAL master branch, you can now do some
> amazing DuckDB queries in GDAL!
>
> For example, h3 level 6 indexing for POIs in NH from Overture Maps
>
> ogr2ogr -f parquet POI_NH_H3.parquet -nlt POINT -a_srs epsg:4326 -oo
> ADBC_DRIVER=libduckdb -oo PRELUDE_STATEMENTS="LOAD SPATIAL" -oo
> PRELUDE_STATEMENTS="load parquet" -oo PRELUDE_STATEMENTS="load httpfs" -
> oo PRELUDE_STATEMENTS="load aws" -oo PRELUDE_STATEMENTS="install h3 from
> community" -oo PRELUDE_STATEMENTS="load h3" -oo SQL="select h3_id,
> ST_GeomFromText(h3_cell_to_boundary_wkt(h3_id)) geometry, count(*)
> POI_count from (select id, h3_latlng_to_cell_string(st_y(geometry),
> st_x(geometry), 6) as h3_id from read_parquet('s3://overturemaps-us-
> west-2/release/2024-12-18.0/theme=places/type=place/*') where
> addresses[1].region = 'NH' and addresses[1].country='US' and bbox.xmin
> between -73 and -70 and bbox.ymin between 42 and 46) group by h3_id" ADBC:
>
> 15.60s user 2.36s system
>
> 48% cpu
>
> 36.935 total
>
> This is using gdal-master from conda on my mac arm64.
>
> https://imgur.com/a/zGp5Mwg <https://imgur.com/a/zGp5Mwg> <https://imgur.com/a/zGp5Mwg> <https://imgur.com/a/zGp5Mwg>>
>
> https://imgur.com/a/1H5JwwZ <https://imgur.com/a/1H5JwwZ> <https://imgur.com/a/1H5JwwZ> <https://imgur.com/a/1H5JwwZ>>
>
> --
>
> Michael Smith
>
> Remote Sensing/GIS Center
>
> US Army Corps of Engineers
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/gdal-dev <https://lists.osgeo.org/mailman/listinfo/gdal-dev>
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev <https://lists.osgeo.org/mailman/listinfo/gdal-dev>
More information about the gdal-dev
mailing list