<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi,</p>
    <p>This has been much improved in upcoming GDAL 3.10.0 : cf in
      particular
<a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538">https://github.com/OSGeo/gdal/blob/15589fea354e69f606af2a856828ecd506cb87b7/NEWS.md?plain=1#L538</a>
      . Now only the header and trailers of part-00000 are read.<br>
    </p>
    <p>That said duckdb will likely still outperform the OGR GeoParquet
      driver (GDAL 3.11 with <a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/pull/11003">https://github.com/OSGeo/gdal/pull/11003</a>
      will allow to use libduckdb)<br>
    </p>
    <p>Even<br>
    </p>
    <div class="moz-cite-prefix">Le 24/10/2024 à 21:41, Varun Sharma via
      gdal-dev a écrit :<br>
    </div>
    <blockquote type="cite"
cite="mid:CAMS_tAsY_=E_=FQEy-UjyjcuMBeAcuWpGc7ehD8JoXE7V2Czdw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">Hello GDAL'ers , <br>
        <br>
        I have made a few attempts at using ogr2ogr for getting bounding
        box based extracts from overturemaps datasets. <br>
        <br>
        I am unfortunately not able to do so - something that takes
        duckdb or  <a
          href="https://github.com/OvertureMaps/overturemaps-py "
          moz-do-not-send="true">overturemaps-py </a> 30s or less takes
        forever when using ogr2ogr. overturemaps-py is essentially a
        wrapper over pyarrow with the arrow filter constructed from
        bbox. <br>
        <br>
        I suspect I am doing something wrong. The lesser probability is
        that ogr2ogr is not the right tool for this. <br>
        <br>
        Attempt 1: Command at the top of the link <br>
        ---------------------------------------------<br>
        <a href="https://pastebin.com/bh05Kcww" moz-do-not-send="true"
          class="moz-txt-link-freetext">https://pastebin.com/bh05Kcww</a><br>
        <br>
        Attempt 2: 
        <div>----------------------------------------------<br>
          <br>
          <a href="https://pastebin.com/BG3WmQ9Y" moz-do-not-send="true"
            class="moz-txt-link-freetext">https://pastebin.com/BG3WmQ9Y</a><br>
          <br>
          From what I can tell, all row groups from each of the parquet
          files is being loaded and checked. This is clearly not
          correct.  <br>
          <br>
          Below are my libs and versions on ubuntu 20.04. All attempts
          are within a conda environment. <br>
          <br>
          gdal                      3.9.2<br>
          gcc_linux-64              12.4.0<br>
          libarrow                  17.0.0<br>
          libarrow-dataset          17.0.0<br>
          libparquet                17.0.0<br>
          zstd                      1.5.6<br>
          libgdal-core              3.9.2<br>
          libgdal-arrow-parquet     3.9.2<br>
          libcurl/8.9.1 <br>
          OpenSSL/3.3.2<br>
          <br>
          I typically use the command line tools to test gdal/ogr's
          functionality and performance before I can embed that
          functionality in my own c++ app. Thus, while there are other
          tools, I would love to understand how to do this in GDAL/OGR. <br>
          <br>
          Please advice ! <br>
          <br>
          cheers,<br>
          Varun<br>
          <br>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.</pre>
  </body>
</html>