<div dir="ltr"><div>Thanks Even for your prompt reply!<br><br>1. Just to clarify, with GDAL v3.10.0, the command <br><br><span style="color:rgb(0,0,0);font-family:Consolas,Menlo,Monaco,"Lucida Console","Liberation Mono","DejaVu Sans Mono","Bitstream Vera Sans Mono",monospace,serif;font-size:12px">ogr2ogr -f GPKG ogr_water.gpkg -spat 7.5 46.5 7.7 46.7 /vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=base/type=water/
</span><br>is fine and I should see a ( significant ) speed up .. yes ? <br><br>2. the apache arrow project libs itself have many knobs to tweak like threadpools, i/o threads, memory pools etc - are these exposed in GDAL configuration options ?<br><br>3. GDAL 3.11 ADBC with libduckdb would be amazing - in my C++ app, I was thinking of directly using libduckdb and duckdb-spatial. but I don't know how to use duckdb in C++ apart from passing SQL queries as strings :). Your linked PR thread and <a href="https://github.com/OSGeo/gdal/issues/10887">https://github.com/OSGeo/gdal/issues/10887</a> are very interesting reads !<br><br><br>Best,<br>Varun</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 24, 2024 at 10:01 PM Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div>
<p>Hi,</p>
<p>This has been much improved in upcoming GDAL 3.10.0 : cf in
particular
<a href="https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538" target="_blank">https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538</a>
. Now only the header and trailers of part-00000 are read.<br>
</p>
<p>That said duckdb will likely still outperform the OGR GeoParquet
driver (GDAL 3.11 with <a href="https://github.com/OSGeo/gdal/pull/11003" target="_blank">https://github.com/OSGeo/gdal/pull/11003</a>
will allow to use libduckdb)<br>
</p>
<p>Even<br>
</p>
<div>Le 24/10/2024 à 21:41, Varun Sharma via
gdal-dev a écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hello GDAL'ers , <br>
<br>
I have made a few attempts at using ogr2ogr for getting bounding
box based extracts from overturemaps datasets. <br>
<br>
I am unfortunately not able to do so - something that takes
duckdb or <a href="https://github.com/OvertureMaps/overturemaps-py" target="_blank">overturemaps-py </a> 30s or less takes
forever when using ogr2ogr. overturemaps-py is essentially a
wrapper over pyarrow with the arrow filter constructed from
bbox. <br>
<br>
I suspect I am doing something wrong. The lesser probability is
that ogr2ogr is not the right tool for this. <br>
<br>
Attempt 1: Command at the top of the link <br>
---------------------------------------------<br>
<a href="https://pastebin.com/bh05Kcww" target="_blank">https://pastebin.com/bh05Kcww</a><br>
<br>
Attempt 2:
<div>----------------------------------------------<br>
<br>
<a href="https://pastebin.com/BG3WmQ9Y" target="_blank">https://pastebin.com/BG3WmQ9Y</a><br>
<br>
From what I can tell, all row groups from each of the parquet
files is being loaded and checked. This is clearly not
correct. <br>
<br>
Below are my libs and versions on ubuntu 20.04. All attempts
are within a conda environment. <br>
<br>
gdal 3.9.2<br>
gcc_linux-64 12.4.0<br>
libarrow 17.0.0<br>
libarrow-dataset 17.0.0<br>
libparquet 17.0.0<br>
zstd 1.5.6<br>
libgdal-core 3.9.2<br>
libgdal-arrow-parquet 3.9.2<br>
libcurl/8.9.1 <br>
OpenSSL/3.3.2<br>
<br>
I typically use the command line tools to test gdal/ogr's
functionality and performance before I can embed that
functionality in my own c++ app. Thus, while there are other
tools, I would love to understand how to do this in GDAL/OGR. <br>
<br>
Please advice ! <br>
<br>
cheers,<br>
Varun<br>
<br>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
gdal-dev mailing list
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
</blockquote>
<pre cols="72">--
<a href="http://www.spatialys.com" target="_blank">http://www.spatialys.com</a>
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.</pre>
</div>
</blockquote></div></div>