[gdal-dev] does gdal support multiple simultaneous writers to raster

Even Rouault even.rouault at mines-paris.org
Sat Jan 12 07:15:39 PST 2013


Le samedi 12 janvier 2013 02:38:55, Kennedy, Paul a écrit :
> Hi,
> Yes, we are pretty sure we will see a significant benefit.  The processing
> algorithms are CPU bound not io bound. Our digital terrain model
> interpolations often run for many hours ( we do them overnight) but the
> underlying file is only a few gigabytes.

OK, my understanding is that you don't really need writers to write 
simultaneously. You need to compute tiles or subwindows of the whole raster in 
parallel, but the writing itself of the result of that computation could be 
well done in a serialized way.

That's a bit what is done with gdalwarp  -wo NUM_THREADS=xxxx . Having 
parallelized I/O could perhaps give some extra performance when you have so 
many threads that the time spent in I/O becomes of the same order of magnitude 
than the time spent in computing, but at the expense of probably a significant 
complexity in GDAL core and drivers.

> If we split them into multiple
> files of tiles and run each on a dedicated process the whole thing is
> quicker, but this is messy and results in a stitching error.
> 
> Another example is gdalwarp. It takes quite some time with a large data set
> and would be. A good candidate for parallelisation, as would gdaladdo.
> 
> I believe slower cores but more of them in pcs are the future. My pc has 8
> but they rarely get used to their potential.
> 
> I am certain there are some challenges here, that's why it is interesting;)
> 
> Regards
> pk
> 
> On 11/01/2013, at 6:54 PM, "Even Rouault" <even.rouault at mines-paris.org> 
wrote:
> > Hi,
> > 
> > This is an intersting topic, with many "intersecting" issues to deal with
> > at different levels.
> > 
> > First, are you confident that in the use cases you imagine that I/O
> > access won't be the limiting factor, in which case serialization of I/O
> > could be acceptable and this would just require an API with a dataset
> > level mutex.
> > 
> > There are several places where parallel write should be addressed :
> > - The GDAL core mechanisms that deal with the block cache
> > - Each GDAL driver where parallel write would be supported. I guess that
> > GDAL drivers should advertize a specific capability
> > - The low-level library used by the driver. In the case of GDAL, libtiff
> > 
> > And finally, as Frank underlined, there are intrinsic limitations due to
> > the format itself. For a compressed TIFF, at some point, you have to
> > serialize the writing of the tile, because you cannot kown in advance
> > the size of the compressed data, or at least have some coordination of
> > the writers so that a "next offset available" is properly synchronized
> > between them. The compression itself could be serialized.
> > 
> > I'm not sure however if what Jan mentionned, different process, writing
> > the same dataset is doable.


More information about the gdal-dev mailing list