<div dir="ltr"><div><div><div><div><div>To add my 2cnts.<br></div><br>With MapWindow GIS we use TauDEM binaries to perform watershed delineations.<br><a href="http://hydrology.usu.edu/taudem/taudem5.0/index.html">http://hydrology.usu.edu/taudem/taudem5.0/index.html</a><br>

</div><br>These TauDEM binaries are optimized to use MPI, but also work if you don't have MPI installed.<br></div>I don't know in detail how it works but in general you set how much parallel processes you want and TauDEM clips your input geotiff in that amount of pieces, with some overlap.<br>

</div>Next it processes each clipped tiff in parallel and combines the results afterwards.<br></div>This works very well and very fast. <br>Perhaps something like this can be introduced for GDAL. The source code for the TauDEM binaries are available so you can have a look how its done.<br>

<br>--<br><div class="gmail_extra"><div>Paul</div>

<br><br><div class="gmail_quote">2013/1/12 Jan Hartmann <span dir="ltr"><<a href="mailto:j.l.h.hartmann@uva.nl" target="_blank">j.l.h.hartmann@uva.nl</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF">

    <font face="Times New Roman, Times, serif">You probably know this,

      but there is an option to let gdalwarp use more cores: -wo

      NUM_THREADS=ALL_CPUS. It gives some improvement, but not really

      staggering. Splitting up operations over individual tiles would

      really fasten up things. Even if I use only one VM, I can define

      32 cores, and it would certainly be interesting to experiment with

      programs like MPI to integrate multiple VMs into one computing

      cluster.<br>

      <br>

      Jan<br>

      <br>

    </font>

    <div>On 01/12/2013 02:38 AM, Kennedy, Paul

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div>Hi,</div>

      <div>Yes, we are pretty sure we will see a significant benefit.

         The processing algorithms are CPU bound not io bound. Our

        digital terrain model interpolations often run for many hours (

        we do them overnight) but the underlying file is only a few

        gigabytes. If we split them into multiple files of tiles and run

        each on a dedicated process the whole thing is quicker, but this

        is messy and results in a stitching error. </div>

      <div><br>

      </div>

      <div>Another example is gdalwarp. It takes quite some time with a

        large data set and would be. A good candidate for

        parallelisation, as would gdaladdo. </div>

      <div><br>

      </div>

      <div>I believe slower cores but more of them in pcs are the

        future. My pc has 8 but they rarely get used to their

        potential. <br>

        <br>

        I am certain there are some challenges here, that's why it is

        interesting;)</div>

      <div><br>

        Regards

        <div>pk</div>

      </div>

      <div><br>

        On 11/01/2013, at 6:54 PM, "Even Rouault" <<a href="mailto:even.rouault@mines-paris.org" target="_blank">even.rouault@mines-paris.org</a>>

        wrote:<br>

        <br>

      </div>

      <blockquote type="cite">

        <div>

          <p><font>Hi,<br>

              <br>

              This is an intersting topic, with many "intersecting"

              issues to deal with at<br>

              different levels.<br>

              <br>

              First, are you confident that in the use cases you imagine

              that I/O access won't<br>

              be the limiting factor, in which case serialization of I/O

              could be acceptable<br>

              and this would just require an API with a dataset level

              mutex.<br>

              <br>

              There are several places where parallel write should be

              addressed :<br>

              - The GDAL core mechanisms that deal with the block cache<br>

              - Each GDAL driver where parallel write would be

              supported. I guess that GDAL<br>

              drivers should advertize a specific capability<br>

              - The low-level library used by the driver. In the case of

              GDAL, libtiff<br>

              <br>

              And finally, as Frank underlined, there are intrinsic

              limitations due to the<br>

              format itself. For a compressed TIFF, at some point, you

              have to serialize the<br>

              writing of the tile, because you cannot kown in advance

              the size of the<br>

              compressed data, or at least have some coordination of the

              writers so that a<br>

              "next offset available" is properly synchronized between

              them. The compression<br>

              itself could be serialized.<br>

              <br>

              I'm not sure however if what Jan mentionned, different

              process, writing the same<br>

              dataset is doable.<br>

              <br>

            </font>

          </p>

        </div>

      </blockquote>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

gdal-dev mailing list

<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>

<a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a></pre>

    </blockquote>

    <br>

  </div>

<br>_______________________________________________<br>

gdal-dev mailing list<br>

<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br></blockquote></div><br></div></div>