[gdal-dev] does gdal support multiple simultaneous writers to raster

Even Rouault even.rouault at mines-paris.org
Sat Jan 12 06:57:27 PST 2013


Le samedi 12 janvier 2013 15:08:55, Jan Hartmann a écrit :
> You probably know this, but there is an option to let gdalwarp use more
> cores: -wo NUM_THREADS=ALL_CPUS. It gives some improvement, but not
> really staggering.

Do you use Proj 4.8.0 ? If not, that might explain why you don't see a 
significant improvement. The performance gain is also much more significant with 
complex resampling kernels. With nearest resampling, most of the time is spent 
in I/O. Increasing the warping memory buffer might also help to benefit from 
parallelization.

For example (debug non-optimized build) :

- 1 thread, nearest :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=1 -wm 512
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m6.390s
user	0m5.940s
sys	0m0.440s

- 4 threads, nearest :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=4 -wm 512
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m3.482s
user	0m6.330s
sys	0m0.700s

- 1 thread, bilinear :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=1 -wm 512 -rb
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m18.387s
user	0m17.840s
sys	0m0.510s

- 4 threads, bilinear :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=4 -wm 512 -rb
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m8.052s
user	0m20.000s
sys	0m0.550s

- 1 thread, cubic :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=1 -wm 512 -rc
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m35.724s
user	0m35.010s
sys	0m0.620s

- 4 threads, cubic :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=4 -wm 512 -rc
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m13.274s
user	0m39.530s
sys	0m0.560s

- 1 thread, lanczos :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=1 -wm 512 -r lanczos
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	2m21.269s
user	2m20.460s
sys	0m0.400s

- 4 threads, lanczos :

$ time gdalwarp world_4326.tif out.tif -t_srs EPSG:3857 -overwrite -wo 
NUM_THREADS=4 -wm 512 -r lanczos
Creating output file that is 8183P x 8201L.
Processing input file world_4326.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.

real	0m51.852s
user	2m36.520s
sys	0m0.750s



> Splitting up operations over individual tiles would
> really fasten up things. Even if I use only one VM, I can define 32
> cores, and it would certainly be interesting to experiment with programs
> like MPI to integrate multiple VMs into one computing cluster.
> 
> Jan
> 
> On 01/12/2013 02:38 AM, Kennedy, Paul wrote:
> > Hi,
> > Yes, we are pretty sure we will see a significant benefit.  The
> > processing algorithms are CPU bound not io bound. Our digital terrain
> > model interpolations often run for many hours ( we do them overnight)
> > but the underlying file is only a few gigabytes. If we split them into
> > multiple files of tiles and run each on a dedicated process the whole
> > thing is quicker, but this is messy and results in a stitching error.
> > 
> > Another example is gdalwarp. It takes quite some time with a large
> > data set and would be. A good candidate for parallelisation, as would
> > gdaladdo.
> > 
> > I believe slower cores but more of them in pcs are the future. My pc
> > has 8 but they rarely get used to their potential.
> > 
> > I am certain there are some challenges here, that's why it is
> > interesting;)
> > 
> > Regards
> > pk
> > 
> > On 11/01/2013, at 6:54 PM, "Even Rouault"
> > <even.rouault at mines-paris.org <mailto:even.rouault at mines-paris.org>>
> > 
> > wrote:
> >> Re: [gdal-dev] does gdal support multiple simultaneous writers to raster
> >> 
> >> Hi,
> >> 
> >> This is an intersting topic, with many "intersecting" issues to deal
> >> with at
> >> different levels.
> >> 
> >> First, are you confident that in the use cases you imagine that I/O
> >> access won't
> >> be the limiting factor, in which case serialization of I/O could be
> >> acceptable
> >> and this would just require an API with a dataset level mutex.
> >> 
> >> There are several places where parallel write should be addressed :
> >> - The GDAL core mechanisms that deal with the block cache
> >> - Each GDAL driver where parallel write would be supported. I guess
> >> that GDAL
> >> drivers should advertize a specific capability
> >> - The low-level library used by the driver. In the case of GDAL, libtiff
> >> 
> >> And finally, as Frank underlined, there are intrinsic limitations due
> >> to the
> >> format itself. For a compressed TIFF, at some point, you have to
> >> serialize the
> >> writing of the tile, because you cannot kown in advance the size of the
> >> compressed data, or at least have some coordination of the writers so
> >> that a
> >> "next offset available" is properly synchronized between them. The
> >> compression
> >> itself could be serialized.
> >> 
> >> I'm not sure however if what Jan mentionned, different process,
> >> writing the same
> >> dataset is doable.
> > 
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev


More information about the gdal-dev mailing list