[gdal-dev] ogr2ogr for downloading extracts from overturemaps
Varun Sharma
vsharma.next at gmail.com
Thu Oct 24 13:26:36 PDT 2024
Thanks Even for your prompt reply!
1. Just to clarify, with GDAL v3.10.0, the command
ogr2ogr -f GPKG ogr_water.gpkg -spat 7.5 46.5 7.7 46.7
/vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=base/type=water/
is fine and I should see a ( significant ) speed up .. yes ?
2. the apache arrow project libs itself have many knobs to tweak like
threadpools, i/o threads, memory pools etc - are these exposed in GDAL
configuration options ?
3. GDAL 3.11 ADBC with libduckdb would be amazing - in my C++ app, I was
thinking of directly using libduckdb and duckdb-spatial. but I don't know
how to use duckdb in C++ apart from passing SQL queries as strings :). Your
linked PR thread and https://github.com/OSGeo/gdal/issues/10887 are very
interesting reads !
Best,
Varun
On Thu, Oct 24, 2024 at 10:01 PM Even Rouault <even.rouault at spatialys.com>
wrote:
> Hi,
>
> This has been much improved in upcoming GDAL 3.10.0 : cf in particular
> https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538
> . Now only the header and trailers of part-00000 are read.
>
> That said duckdb will likely still outperform the OGR GeoParquet driver
> (GDAL 3.11 with https://github.com/OSGeo/gdal/pull/11003 will allow to
> use libduckdb)
>
> Even
> Le 24/10/2024 à 21:41, Varun Sharma via gdal-dev a écrit :
>
> Hello GDAL'ers ,
>
> I have made a few attempts at using ogr2ogr for getting bounding box based
> extracts from overturemaps datasets.
>
> I am unfortunately not able to do so - something that takes duckdb or
> overturemaps-py <https://github.com/OvertureMaps/overturemaps-py> 30s or
> less takes forever when using ogr2ogr. overturemaps-py is essentially a
> wrapper over pyarrow with the arrow filter constructed from bbox.
>
> I suspect I am doing something wrong. The lesser probability is that
> ogr2ogr is not the right tool for this.
>
> Attempt 1: Command at the top of the link
> ---------------------------------------------
> https://pastebin.com/bh05Kcww
>
> Attempt 2:
> ----------------------------------------------
>
> https://pastebin.com/BG3WmQ9Y
>
> From what I can tell, all row groups from each of the parquet files is
> being loaded and checked. This is clearly not correct.
>
> Below are my libs and versions on ubuntu 20.04. All attempts are within a
> conda environment.
>
> gdal 3.9.2
> gcc_linux-64 12.4.0
> libarrow 17.0.0
> libarrow-dataset 17.0.0
> libparquet 17.0.0
> zstd 1.5.6
> libgdal-core 3.9.2
> libgdal-arrow-parquet 3.9.2
> libcurl/8.9.1
> OpenSSL/3.3.2
>
> I typically use the command line tools to test gdal/ogr's functionality
> and performance before I can embed that functionality in my own c++ app.
> Thus, while there are other tools, I would love to understand how to do
> this in GDAL/OGR.
>
> Please advice !
>
> cheers,
> Varun
>
>
>
> _______________________________________________
> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> -- http://www.spatialys.com
> My software is free, but my time generally not.
> Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20241024/efa291d0/attachment.htm>
More information about the gdal-dev
mailing list