<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi,</p>
<p>This has been much improved in upcoming GDAL 3.10.0 : cf in
particular
<a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538">https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538</a>
. Now only the header and trailers of part-00000 are read.<br>
</p>
<p>That said duckdb will likely still outperform the OGR GeoParquet
driver (GDAL 3.11 with <a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/pull/11003">https://github.com/OSGeo/gdal/pull/11003</a>
will allow to use libduckdb)<br>
</p>
<p>Even<br>
</p>
<div class="moz-cite-prefix">Le 24/10/2024 à 21:41, Varun Sharma via
gdal-dev a écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAMS_tAsY_=E_=FQEy-UjyjcuMBeAcuWpGc7ehD8JoXE7V2Czdw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hello GDAL'ers , <br>
<br>
I have made a few attempts at using ogr2ogr for getting bounding
box based extracts from overturemaps datasets. <br>
<br>
I am unfortunately not able to do so - something that takes
duckdb or <a
href="https://github.com/OvertureMaps/overturemaps-py "
moz-do-not-send="true">overturemaps-py </a> 30s or less takes
forever when using ogr2ogr. overturemaps-py is essentially a
wrapper over pyarrow with the arrow filter constructed from
bbox. <br>
<br>
I suspect I am doing something wrong. The lesser probability is
that ogr2ogr is not the right tool for this. <br>
<br>
Attempt 1: Command at the top of the link <br>
---------------------------------------------<br>
<a href="https://pastebin.com/bh05Kcww" moz-do-not-send="true"
class="moz-txt-link-freetext">https://pastebin.com/bh05Kcww</a><br>
<br>
Attempt 2:
<div>----------------------------------------------<br>
<br>
<a href="https://pastebin.com/BG3WmQ9Y" moz-do-not-send="true"
class="moz-txt-link-freetext">https://pastebin.com/BG3WmQ9Y</a><br>
<br>
From what I can tell, all row groups from each of the parquet
files is being loaded and checked. This is clearly not
correct. <br>
<br>
Below are my libs and versions on ubuntu 20.04. All attempts
are within a conda environment. <br>
<br>
gdal 3.9.2<br>
gcc_linux-64 12.4.0<br>
libarrow 17.0.0<br>
libarrow-dataset 17.0.0<br>
libparquet 17.0.0<br>
zstd 1.5.6<br>
libgdal-core 3.9.2<br>
libgdal-arrow-parquet 3.9.2<br>
libcurl/8.9.1 <br>
OpenSSL/3.3.2<br>
<br>
I typically use the command line tools to test gdal/ogr's
functionality and performance before I can embed that
functionality in my own c++ app. Thus, while there are other
tools, I would love to understand how to do this in GDAL/OGR. <br>
<br>
Please advice ! <br>
<br>
cheers,<br>
Varun<br>
<br>
<br>
</div>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.</pre>
</body>
</html>