[gdal-dev] ogr2ogr for downloading extracts from overturemaps

Varun Sharma vsharma.next at gmail.com
Thu Oct 24 12:41:34 PDT 2024


Hello GDAL'ers ,

I have made a few attempts at using ogr2ogr for getting bounding box based
extracts from overturemaps datasets.

I am unfortunately not able to do so - something that takes duckdb or
overturemaps-py  <https://github.com/OvertureMaps/overturemaps-py> 30s or
less takes forever when using ogr2ogr. overturemaps-py is essentially a
wrapper over pyarrow with the arrow filter constructed from bbox.

I suspect I am doing something wrong. The lesser probability is that
ogr2ogr is not the right tool for this.

Attempt 1: Command at the top of the link
---------------------------------------------
https://pastebin.com/bh05Kcww

Attempt 2:
----------------------------------------------

https://pastebin.com/BG3WmQ9Y

>From what I can tell, all row groups from each of the parquet files is
being loaded and checked. This is clearly not correct.

Below are my libs and versions on ubuntu 20.04. All attempts are within a
conda environment.

gdal                      3.9.2
gcc_linux-64              12.4.0
libarrow                  17.0.0
libarrow-dataset          17.0.0
libparquet                17.0.0
zstd                      1.5.6
libgdal-core              3.9.2
libgdal-arrow-parquet     3.9.2
libcurl/8.9.1
OpenSSL/3.3.2

I typically use the command line tools to test gdal/ogr's functionality and
performance before I can embed that functionality in my own c++ app. Thus,
while there are other tools, I would love to understand how to do this in
GDAL/OGR.

Please advice !

cheers,
Varun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20241024/f7894a6c/attachment.htm>


More information about the gdal-dev mailing list