<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Frank,<br>

    </p>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif">I'm actually surprised

            that it is considered impractical to make some/most drivers

            thread safe (read-only) natively and I thought key ones

            already were mostly there in the past though I'm not so

            confident in my memory.  </div>

        </div>

      </div>

    </blockquote>

    Things have got more sophisticated over time with more deferred

    loading strategies, but I don't think any driver has been close to

    be thread-safe even in read-only scenarios. Except maybe the

    RasterIO part of the RawDataset, which has pretty much no mutable

    state (if you exclude file operations, and anything involving the

    block cache)<br>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif"> Looking through the

            GTiff code I find myself not able to work out where locking

            is done and how we are managing the directory

            loading/switching as block reads are done. <br>

          </div>

        </div>

      </div>

    </blockquote>

    No locking is done. To be noted that the logic of TIFF directory

    loading/switching changed "a few" years ago. In the previous

    situation, the main dataset and its overview datasets were sharing

    the same TIFF* handle. But that strategy of switching from TIFF

    directory when switching between overview datasets had also a major

    performance drawback of invalidating TIFF bytecount[] and offset[]

    arrays, which was particularly annoying when computing overviews,

    where that back&forth is done consistently. And that

    inconvenience increased with the size of datasets and those arrays.

    But even if there are now several TIFF* handles, they do share the

    same VSILFILE* underneath, and there's a layer between the driver

    and libtiff to ensure that the file pointer position is properly set

    when activating one TIFF* handle at the position it expected, that

    flushes are done propertly so that one TIFF* handle does see a

    consistent state on disk (for read/write scenarios), etc.<br>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif"><br>

          </div>

          <div class="gmail_default"

            style="font-family:arial,sans-serif">I would certainly like

            to work towards read-only thread safety for some drivers

            with:</div>

          <div class="gmail_default"

            style="font-family:arial,sans-serif"> - a shared block cache

            for all threads reading from a dataset</div>

        </div>

      </div>

    </blockquote>

    That easiest way of doing that would be to put a per-dataset mutex

    in classes GDALHashSetBandBlockCache and GDALArrayBandBlockCache,

    but that would hurt scalability even when several threads try to

    access different blocks at the same time (unless smart things are

    done to drop the mutex as soon as possible)<br>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif"> - no duplication of

            file handles per-thread</div>

        </div>

      </div>

    </blockquote>

    <p>I just don't see how to do that. Even a mutex around file

      operations would not be sufficient since we'd want both

      Seek()+Read() to be atomic per thread. PRead() would solve that,

      but very few drivers or underlying libraries use PRead() (and

      there's no implementation of it for all file systems, like Windows

      file API which lacks it). Or one would need a VSIThreadSafeHandle

      that would maintain a per-thread position, but that would still

      involves locking around access to the underneath handle.<br>

    </p>

    <p>libtiff itself has a lot of state variables, in its core or in

      its codecs (e.g. the deflate codec has to maintain a zlib handle

      as a state variable if doing partial reads of a tile/strip).  Even

      getting the location of a TIFF tile is not thread-safe when

      defered byteount[]/offset[] array loading is enabled (which is

      necessary when dealing with extremely large remote COGs) since it

      involves file accesses. </p>

    <p>You could raise the point that the GTiff driver does have

      multi-threaded read optimizations, but it takes a lot of

      precautions to make that happening safely, and it has the chance

      of fully controlling the conditions where that happens. So

      basically it starts by fetching the location and size of the

      needed blocks, then each thread will read the needed blocks

      (either using PRead() if available, or taking a mutex around file

      accesses), create per-thread TIFF* handle, replicating the

      strategic TIFF tags from the main TIFF handle so that the codecs

      are initialized with the appropriate state and using a new

      TIFFReadFromUserBuffer() call that was added in libtiff to decode

      an in-memory encoded buffer (without doing file accesses). And

      even with that, I remember having fixed last year or so a case

      that I totally missed initially, where the block cache

      accidentally triggered in one of those worker thread. So yes,

      multi-threading is hard to get right.<br>

    </p>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif"> - a way of checking

            that a dataset is considered to be thread safe. <br>

          </div>

        </div>

      </div>

    </blockquote>

    Not totally sure to understand what you mean here. The RFC

    introduces a IsThreadSafe() virtual method that a driver can

    potentially implement (only set to true currently by instances of

    GDALThreadSafeDataset). But finding if a driver is thread-safe seems

    to be a non-solvable/provable problem to me, apart from staring at

    the code for a sufficient long time to convince oneself it is

    actually + adding some torture tests and hope they catch most

    issues. But things can get very complicated if you need to test all

    combinations of virtual methods that can be called...<br>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif"><br>

          </div>

          <div class="gmail_default"

            style="font-family:arial,sans-serif">I'm curious if you can

            talk a bit more about the challenges of accomplishing the

            above.   I had been hoping to take advantage of my

            (assumption) that this already mostly existed in GDAL for

            our new high performance environment at Planet so it is of

            more than academic interest to me.</div>

        </div>

      </div>

    </blockquote>

    <p>Probably that implementing a restricted read-only thread-safe

      TIFF driver, only addressing the common cases of TIFF without all

      the edge cases GDAL and libtiff try to address, for a controlled

      set of codecs, assuming pread() is always available, assuming you

      always read full tiles & strips, re-implementing a very

      simplified libtiff, and doing it "at hand" would be workable.

      Perhaps also excluding raster queries not at a resolution that

      exactly matches one of the overview level to completely by-pass

      the block cache. But that would be a separate driver, with a more

      restricted set of functionality than our current GDAL GeoTIFF

      driver that handles all the possible cases.</p>

    <blockquote type="cite"

cite="mid:CA+YzLBfUPuFEjs3Bjck6fgPQn3uGvn9-KFh_ihOk8Giz8CzkaQ@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div class="gmail_default"

            style="font-family:arial,sans-serif">  Also, doesn't

            gdalwarp's multithreading support already depend on

            read-only thread safety for source files?</div>

        </div>

      </div>

    </blockquote>

    <p>Definitely not. The initial -multi mode works with 2 threads: one

      for I/O, one for compute. The later was further extended to also

      multi-thread the compute part. But I/O is still done by a single

      thread, hence no requirement on the thread-safety of drivers.<br>

    </p>

    Even<br>

    <pre class="moz-signature" cols="72">-- 

<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>

My software is free, but my time generally not.</pre>

  </body>

</html>