<div dir="ltr"><div><div><div><div><div>To add my 2cnts.<br></div><br>With MapWindow GIS we use TauDEM binaries to perform watershed delineations.<br><a href="http://hydrology.usu.edu/taudem/taudem5.0/index.html">http://hydrology.usu.edu/taudem/taudem5.0/index.html</a><br>
</div><br>These TauDEM binaries are optimized to use MPI, but also work if you don't have MPI installed.<br></div>I don't know in detail how it works but in general you set how much parallel processes you want and TauDEM clips your input geotiff in that amount of pieces, with some overlap.<br>
</div>Next it processes each clipped tiff in parallel and combines the results afterwards.<br></div>This works very well and very fast. <br>Perhaps something like this can be introduced for GDAL. The source code for the TauDEM binaries are available so you can have a look how its done.<br>
<br>--<br><div class="gmail_extra"><div>Paul</div>
<br><br><div class="gmail_quote">2013/1/12 Jan Hartmann <span dir="ltr"><<a href="mailto:j.l.h.hartmann@uva.nl" target="_blank">j.l.h.hartmann@uva.nl</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<font face="Times New Roman, Times, serif">You probably know this,
but there is an option to let gdalwarp use more cores: -wo
NUM_THREADS=ALL_CPUS. It gives some improvement, but not really
staggering. Splitting up operations over individual tiles would
really fasten up things. Even if I use only one VM, I can define
32 cores, and it would certainly be interesting to experiment with
programs like MPI to integrate multiple VMs into one computing
cluster.<br>
<br>
Jan<br>
<br>
</font>
<div>On 01/12/2013 02:38 AM, Kennedy, Paul
wrote:<br>
</div>
<blockquote type="cite">
<div>Hi,</div>
<div>Yes, we are pretty sure we will see a significant benefit.
The processing algorithms are CPU bound not io bound. Our
digital terrain model interpolations often run for many hours (
we do them overnight) but the underlying file is only a few
gigabytes. If we split them into multiple files of tiles and run
each on a dedicated process the whole thing is quicker, but this
is messy and results in a stitching error. </div>
<div><br>
</div>
<div>Another example is gdalwarp. It takes quite some time with a
large data set and would be. A good candidate for
parallelisation, as would gdaladdo. </div>
<div><br>
</div>
<div>I believe slower cores but more of them in pcs are the
future. My pc has 8 but they rarely get used to their
potential. <br>
<br>
I am certain there are some challenges here, that's why it is
interesting;)</div>
<div><br>
Regards
<div>pk</div>
</div>
<div><br>
On 11/01/2013, at 6:54 PM, "Even Rouault" <<a href="mailto:even.rouault@mines-paris.org" target="_blank">even.rouault@mines-paris.org</a>>
wrote:<br>
<br>
</div>
<blockquote type="cite">
<div>
<p><font>Hi,<br>
<br>
This is an intersting topic, with many "intersecting"
issues to deal with at<br>
different levels.<br>
<br>
First, are you confident that in the use cases you imagine
that I/O access won't<br>
be the limiting factor, in which case serialization of I/O
could be acceptable<br>
and this would just require an API with a dataset level
mutex.<br>
<br>
There are several places where parallel write should be
addressed :<br>
- The GDAL core mechanisms that deal with the block cache<br>
- Each GDAL driver where parallel write would be
supported. I guess that GDAL<br>
drivers should advertize a specific capability<br>
- The low-level library used by the driver. In the case of
GDAL, libtiff<br>
<br>
And finally, as Frank underlined, there are intrinsic
limitations due to the<br>
format itself. For a compressed TIFF, at some point, you
have to serialize the<br>
writing of the tile, because you cannot kown in advance
the size of the<br>
compressed data, or at least have some coordination of the
writers so that a<br>
"next offset available" is properly synchronized between
them. The compression<br>
itself could be serialized.<br>
<br>
I'm not sure however if what Jan mentionned, different
process, writing the same<br>
dataset is doable.<br>
<br>
</font>
</p>
</div>
</blockquote>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
gdal-dev mailing list
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>
<a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a></pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
gdal-dev mailing list<br>
<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br></blockquote></div><br></div></div>