<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Both below issues should now be fixed per
      <a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/pull/13606">https://github.com/OSGeo/gdal/pull/13606</a> .  Turns out what caused
      GDAL to probe all files even when _metadata is present is perhaps
      completely different from the reason for the python reproducer in
      the below apache/arrow issue.</p>
    <div class="moz-cite-prefix">Le 28/12/2025 à 16:48, Even Rouault via
      gdal-dev a écrit :<br>
    </div>
    <blockquote type="cite"
      cite="mid:8e153920-1587-4c34-98e3-96d1e7ed5862@spatialys.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>Hi Mike,</p>
      <p>the problem is likely two folds:</p>
      <p>- "gdal vector partition" doesn't write the "_metadata" file
        that contains the schema and the path to the actual .parquet
        files</p>
      <p>- but even if it did, I cannot manage to convince
        libarrow/libparquet to not probe all files. Not sure if I'm
        missing something in the API or if that's a fundamental
        limitation of the library. I've filed <a
          class="moz-txt-link-freetext"
          href="https://github.com/apache/arrow/issues/48671"
          moz-do-not-send="true">https://github.com/apache/arrow/issues/48671</a>
        about that.  I've considered implementing a workaround on GDAL
        side but I couldn't come with anything.</p>
      <p>Your best workaround is to directly access <span
        style="white-space: pre-wrap"> "/vsis3/bucket/overture/20251217/overture-buildings/country=US" </span><br>
      </p>
      <p>Even</p>
      <div class="moz-cite-prefix">Le 28/12/2025 à 13:26, Michael Smith
        via gdal-dev a écrit :<br>
      </div>
      <blockquote type="cite"
        cite="mid:78AB845E-CBE8-4E82-97AD-7BF8166C5F48@gmail.com">
        <pre wrap="" class="moz-quote-pre">I know that gdal can write parquet data with hive partitioning using gdal vector partition, but after doing so, can gdal do the partition elimination on reading when a where/attribute is specified on the partition key?

I was trying to do a pipeline with:
gdal vector pipeline !  read  "/vsis3/bucket/overture/20251217/overture-buildings/” ! filter  --bbox -117.486117584442,33.9156194185775,-117.333055544584,33.9745995301481 --where "country='US'" ! write -f parquet /tmp/test1.parquet --progress --overwrite 

but in CPL_DEBUG I see it scanning all the partitions rather than just querying the country=US partition. 

S3: Downloading 0-1605631 (<a class="moz-txt-link-freetext"
href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAI/data_0.parquet"
        moz-do-not-send="true">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAI/data_0.parquet</a>)...
S3: Got response_code=206
S3: Downloading 0-16383999 (<a class="moz-txt-link-freetext"
href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet"
        moz-do-not-send="true">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet</a>)...
S3: Got response_code=206
S3: Downloading 0-16383999 (<a class="moz-txt-link-freetext"
href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet"
        moz-do-not-send="true">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet</a>)...
S3: Got response_code=206
S3: Downloading 16384000-32767999 (<a class="moz-txt-link-freetext"
href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet"
        moz-do-not-send="true">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet</a>)...
S3: Got response_code=206
S3: Downloading 16384000-29741378 (<a class="moz-txt-link-freetext"
href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet"
        moz-do-not-send="true">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet</a>)...
....



</pre>
      </blockquote>
      <pre class="moz-signature" cols="72">-- 
<a class="moz-txt-link-freetext" href="http://www.spatialys.com"
      moz-do-not-send="true">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre wrap="" class="moz-quote-pre">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
  </body>
</html>