<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style></head><body lang=EN-US link=blue vlink=purple style='word-wrap:break-word'><div class=WordSection1><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'>Wow, very cool. Yeah the source had been written with duckdb so I believe the metadata is present. <o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'>I’ll check out master once this is merged.<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'>Thanks so much!<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'>Mike<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><div><div><p class=MsoNormal>-- <o:p></o:p></p></div><div><p class=MsoNormal>Michael Smith<o:p></o:p></p><p class=MsoNormal>RSGIS Center – ERDC CRREL NH<o:p></o:p></p><p class=MsoNormal>US Army Corps<span style='font-family:"Calibri",sans-serif'><o:p></o:p></span></p></div></div><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-family:"Calibri",sans-serif;color:black'>From: </span></b><span style='font-family:"Calibri",sans-serif;color:black'>gdal-dev <gdal-dev-bounces@lists.osgeo.org> on behalf of Even Rouault via gdal-dev <gdal-dev@lists.osgeo.org><br><b>Reply-To: </b>Even Rouault <even.rouault@spatialys.com><br><b>Date: </b>Sunday, December 28, 2025 at 6:56 PM<br><b>To: </b><gdal-dev@lists.osgeo.org><br><b>Subject: </b>Re: [gdal-dev] gdal parquet and hive partitioning<o:p></o:p></span></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><p>Both below issues should now be fixed per <a href="https://github.com/OSGeo/gdal/pull/13606">https://github.com/OSGeo/gdal/pull/13606</a> . Turns out what caused GDAL to probe all files even when _metadata is present is perhaps completely different from the reason for the python reproducer in the below apache/arrow issue.<o:p></o:p></p><div><p class=MsoNormal>Le 28/12/2025 à 16:48, Even Rouault via gdal-dev a écrit :<o:p></o:p></p></div><blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><p>Hi Mike,<o:p></o:p></p><p>the problem is likely two folds:<o:p></o:p></p><p>- "gdal vector partition" doesn't write the "_metadata" file that contains the schema and the path to the actual .parquet files<o:p></o:p></p><p>- but even if it did, I cannot manage to convince libarrow/libparquet to not probe all files. Not sure if I'm missing something in the API or if that's a fundamental limitation of the library. I've filed <a href="https://github.com/apache/arrow/issues/48671">https://github.com/apache/arrow/issues/48671</a> about that. I've considered implementing a workaround on GDAL side but I couldn't come with anything.<o:p></o:p></p><p>Your best workaround is to directly access "/vsis3/bucket/overture/20251217/overture-buildings/country=US" <o:p></o:p></p><p>Even<o:p></o:p></p><div><p class=MsoNormal>Le 28/12/2025 à 13:26, Michael Smith via gdal-dev a écrit :<o:p></o:p></p></div><blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre>I know that gdal can write parquet data with hive partitioning using gdal vector partition, but after doing so, can gdal do the partition elimination on reading when a where/attribute is specified on the partition key?<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>I was trying to do a pipeline with:<o:p></o:p></pre><pre>gdal vector pipeline ! read "/vsis3/bucket/overture/20251217/overture-buildings/” ! filter --bbox -117.486117584442,33.9156194185775,-117.333055544584,33.9745995301481 --where "country='US'" ! write -f parquet /tmp/test1.parquet --progress --overwrite <o:p></o:p></pre><pre><o:p> </o:p></pre><pre>but in CPL_DEBUG I see it scanning all the partitions rather than just querying the country=US partition. <o:p></o:p></pre><pre><o:p> </o:p></pre><pre>S3: Downloading 0-1605631 (<a href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAI/data_0.parquet">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAI/data_0.parquet</a>)...<o:p></o:p></pre><pre>S3: Got response_code=206<o:p></o:p></pre><pre>S3: Downloading 0-16383999 (<a href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet</a>)...<o:p></o:p></pre><pre>S3: Got response_code=206<o:p></o:p></pre><pre>S3: Downloading 0-16383999 (<a href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet</a>)...<o:p></o:p></pre><pre>S3: Got response_code=206<o:p></o:p></pre><pre>S3: Downloading 16384000-32767999 (<a href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_2.parquet</a>)...<o:p></o:p></pre><pre>S3: Got response_code=206<o:p></o:p></pre><pre>S3: Downloading 16384000-29741378 (<a href="https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet">https://bucket.s3.us-east-1.amazonaws.com/overture/20251217/overture-buildings/country%3DAL/data_3.parquet</a>)...<o:p></o:p></pre><pre>....<o:p></o:p></pre><pre><o:p> </o:p></pre><pre><o:p> </o:p></pre><pre><o:p> </o:p></pre></blockquote><pre>-- <o:p></o:p></pre><pre><a href="http://www.spatialys.com">http://www.spatialys.com</a><o:p></o:p></pre><pre>My software is free, but my time generally not.<o:p></o:p></pre><p class=MsoNormal><br><br><o:p></o:p></p><pre>_______________________________________________<o:p></o:p></pre><pre>gdal-dev mailing list<o:p></o:p></pre><pre><a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><o:p></o:p></pre><pre><a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><o:p></o:p></pre></blockquote><pre>-- <o:p></o:p></pre><pre><a href="http://www.spatialys.com">http://www.spatialys.com</a><o:p></o:p></pre><pre>My software is free, but my time generally not.<o:p></o:p></pre><p class=MsoNormal>_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev <o:p></o:p></p></div></body></html>