<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <font face="Times New Roman, Times, serif">This is about the same

      acceleartion (2-3 times) as I got on jobs running for a few days.

      My impression is that distributed tile processing would give much

      more dramatic results.<br>

      <br>

    </font><br>

    <div class="moz-cite-prefix">On 01/12/2013 03:57 PM, Even Rouault

      wrote:<br>

    </div>

    <blockquote

      cite="mid:201301121557.27335.even.rouault@mines-paris.org"

      type="cite">

      <pre wrap="">Le samedi 12 janvier 2013 15:08:55, Jan Hartmann a écrit :

</pre>

      <blockquote type="cite">

        <pre wrap="">You probably know this, but there is an option to let gdalwarp use more

cores: -wo NUM_THREADS=ALL_CPUS. It gives some improvement, but not

really staggering.

</pre>

      </blockquote>

      <pre wrap="">

Do you use Proj 4.8.0 ? If not, that might explain why you don't see a 

significant improvement. The performance gain is also much more significant with 

complex resampling kernels. With nearest resampling, most of the time is spent 

in I/O. Increasing the warping memory buffer might also help to benefit from 

parallelization.

For example (debug non-optimized build) :

- 1 thread, nearest :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=1 -wm 512

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m6.390s

user    0m5.940s

sys     0m0.440s

- 4 threads, nearest :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=4 -wm 512

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m3.482s

user    0m6.330s

sys     0m0.700s

- 1 thread, bilinear :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=1 -wm 512 -rb

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m18.387s

user    0m17.840s

sys     0m0.510s

- 4 threads, bilinear :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=4 -wm 512 -rb

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m8.052s

user    0m20.000s

sys     0m0.550s

- 1 thread, cubic :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=1 -wm 512 -rc

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m35.724s

user    0m35.010s

sys     0m0.620s

- 4 threads, cubic :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=4 -wm 512 -rc

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m13.274s

user    0m39.530s

sys     0m0.560s

- 1 thread, lanczos :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=1 -wm 512 -r lanczos

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    2m21.269s

user    2m20.460s

sys     0m0.400s

- 4 threads, lanczos :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 

NUM_THREADS=4 -wm 512 -r lanczos

Creating output file that is 8183P x 8201L.

Processing input file world_4326.tif.

0...10...20...30...40...50...60...70...80...90...100 - done.

real    0m51.852s

user    2m36.520s

sys     0m0.750s

</pre>

      <blockquote type="cite">

        <pre wrap="">Splitting up operations over individual tiles would

really fasten up things. Even if I use only one VM, I can define 32

cores, and it would certainly be interesting to experiment with programs

like MPI to integrate multiple VMs into one computing cluster.

Jan

On 01/12/2013 02:38 AM, Kennedy, Paul wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Hi,

Yes, we are pretty sure we will see a significant benefit.  The

processing algorithms are CPU bound not io bound. Our digital terrain

model interpolations often run for many hours ( we do them overnight)

but the underlying file is only a few gigabytes. If we split them into

multiple files of tiles and run each on a dedicated process the whole

thing is quicker, but this is messy and results in a stitching error.

Another example is gdalwarp. It takes quite some time with a large

data set and would be. A good candidate for parallelisation, as would

gdaladdo.

I believe slower cores but more of them in pcs are the future. My pc

has 8 but they rarely get used to their potential.

I am certain there are some challenges here, that's why it is

interesting;)

Regards

pk

On 11/01/2013, at 6:54 PM, "Even Rouault"

<<a class="moz-txt-link-abbreviated" href="mailto:even.rouault@mines-paris.org">even.rouault@mines-paris.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:even.rouault@mines-paris.org"><mailto:even.rouault@mines-paris.org></a>>

wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">Re: [gdal-dev] does gdal support multiple simultaneous writers to raster

Hi,

This is an intersting topic, with many "intersecting" issues to deal

with at

different levels.

First, are you confident that in the use cases you imagine that I/O

access won't

be the limiting factor, in which case serialization of I/O could be

acceptable

and this would just require an API with a dataset level mutex.

There are several places where parallel write should be addressed :

- The GDAL core mechanisms that deal with the block cache

- Each GDAL driver where parallel write would be supported. I guess

that GDAL

drivers should advertize a specific capability

- The low-level library used by the driver. In the case of GDAL, libtiff

And finally, as Frank underlined, there are intrinsic limitations due

to the

format itself. For a compressed TIFF, at some point, you have to

serialize the

writing of the tile, because you cannot kown in advance the size of the

compressed data, or at least have some coordination of the writers so

that a

"next offset available" is properly synchronized between them. The

compression

itself could be serialized.

I'm not sure however if what Jan mentionned, different process,

writing the same

dataset is doable.

</pre>

          </blockquote>

          <pre wrap="">

_______________________________________________

gdal-dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>

<a class="moz-txt-link-freetext" href="http://lists.osgeo.org/mailman/listinfo/gdal-dev">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a>

</pre>

        </blockquote>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>