<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi Craig,</p>

    <p>your option (1) sounds good to me. However, there is no

      requirement that all file entries in a directory are consecutive,

      so the raw entry list could potentially be</p>

    <p>dirA/file1<br>

      dirB/file1<br>

      dirA/file2<br>

      dirB/file2<br>

      <br>

      depending on how files are inserted in the ZIP. So you likely need

      to sort things before creating your index.</p>

    <p>Even</p>

    <div class="moz-cite-prefix">Le 19/08/2025 à 07:26, Craig de Stigter

      via gdal-dev a écrit :<br>

    </div>

    <blockquote type="cite"

cite="mid:CAF1M8pcYuUEavzLgNtF_6opJsyVtEiNOxs5=EcQjt=SF+iJprw@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div>Hi folks<br>

          <br>

          I've stumbled across VSIReadDirRecursive being really slow

          when I give it a ridiculously large ZIP file (containing 5

          million files across ~1500 subdirectories)<br>

          <br>

          I spent a while poking round the source code. It looks like

          VSIArchiveFilesystemHandler::ReadDirEx() performs repeated

          linear scans through the flat VSIArchiveContent::entries array

          during recursive directory traversal. For each directory

          level, it scans all entries from the beginning, resulting in

          O(n²) time complexity.<br>

          <br>

          Performance degrades from ~1.3s for the first 5,000 files to

          ~6.7s for 5000 files once I get 100K files into a

          5-million-file ZIP archive, and keeps getting worse from

          there. I haven't managed to list the whole 5M-file archive

          yet...<br>

          <br>

          A couple of possible solutions:<br>

          <br>

          1. Add a directory index to VSIArchiveContent (add a map of

          string directory paths to index in the entries array) so we

          can jumpstart the ReadDirEx implementation at the right place<br>

          2. make a VSIDIRArchive class (subclass of VSIDIR) and

          override OpenDir/NextDirEntry, so that it doens't call

          ReadDirEx repeatedly but instead just returns entries from the

          VSIArchiveContent::entries array.<br>

          <br>

          I'm leaning towards (1) because it would presumably improve

          random lookups by file path also (not just ReadDirRecursive).

          Is this something that would be accepted as a PR?<br>

          <br>

          Thanks<br>

          <br>

        </div>

        <br>

        <span class="gmail_signature_prefix">-- </span><br>

        <div dir="ltr" class="gmail_signature"

          data-smartmail="gmail_signature">

          <div dir="ltr">

            <div

style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px">Regards,</div>

            <div

style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px">Craig</div>

            <div

style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px"><br>

            </div>

            <div

style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px">Platform

              Engineer<br>

            </div>

            <div

style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px">Koordinates</div>

            <div

style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px"><a

                href="http://koordinates.com/"

                style="color:rgb(17,85,204)" target="_blank"

                moz-do-not-send="true">koordinates.com</a> / <a

                href="https://twitter.com/koordinates"

                style="color:rgb(17,85,204)" target="_blank"

                moz-do-not-send="true">@koordinates</a></div>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

gdal-dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>

<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>

My software is free, but my time generally not.</pre>

  </body>

</html>