[gdal-dev] does gdal support multiple simultaneous writers to raster

N. Farah nfarah at hotmail.com
Mon Jan 14 04:38:07 PST 2013


Thanks Even for your response. That clarifies more the current capabilities of GDAL on parallel processing.I think that as parallel processing gain more momentum, GDAL and multiple GDAL based applications would benefit for adopting it.In Windows world, parallel processing was added in Visual Studio 2010 (PPL for C++ and TPL for .net) and re-enforced a bit morein Visual Studio 2012 by C++ AMP: GPU based processing.
I do not know enough the low level GDAL caching but it seems to me that option B) could work (dataset based caching).I agree that exposing capability level per driver about concurrent safety is a good one. As some drivers dependson 3rd party libraries that might not be concurrent safe. Tagging those drivers as not concurrent safe would be a first step,before trying to address their underlying concurrency safe issues.
As far as reading datasets, I tested HDF4 or 5 dataset to read its subdatasets in a parallel way and encountered an error.So some drivers may have the parallel reading issue as well. May be reading sub-datasets is simply not concurrent safe evenif each sub-dataset is opened as its own dataset ? 
If there is interest to work on parallel writing (and may be reading), I can help at least by testing.
ThanksNoureddine Farah

> From: even.rouault at mines-paris.org
> To: gdal-dev at lists.osgeo.org
> Subject: Re: [gdal-dev] does gdal support multiple simultaneous writers to raster
> Date: Sat, 12 Jan 2013 16:35:28 +0100
> CC: nfarah at hotmail.com
> 
> > ex. convert
> > multiple datasets to different output datasets in a parallel way.
> 
> As Frank underlined, there's currently an issue with the global block cache 
> regarding write support.
> 
> Imagine that you have 2 threads A and B.
> Thread A deal with dataset A, and thread B deal with dataset .
> Thread A is in the middle of writing some tile/line of dataset A.
> Thread B is trying to fill a new entry in the block cache (with new read data, 
> or new data to write). But the block cache is full. So the last recently used 
> entry must be discarded. If that entry is a dirty block of dataset A, then it 
> must be flushed to disk, in the context of thread B, but at that time thread A 
> is also writing data... Which might be an issue since drivers are re-entrant 
> (can be invoked by multiple threads, if each thread deal with different 
> datasets) not thread-safe.
> 
> This specific case here could be fixed in different ways :
> A) Making drivers thread-safe (or accessing them through a thread-safe layer), 
> that is to say add a dataset level mutex
> B) or having a per-dataset block cache instead of a global block cache
> C) deal differently with dirty blocks. Only flush them if the operation that 
> need to discard the dirty block is initiated by an operation on the same 
> dataset as the dirty block.
> 
> 
> > Would
> > those parallel operations not be affected by GDAL caching for bot read and
> > write.Since the cache is set to a limit. Is Accessing the current used
> > cache value concurrent safeto increase it/decrease it ?
> 
> Hum, I see that GDALGetCacheMax() and GDALSetCacheMax() are not thread safe 
> currently. We would need to protect them by the raster block mutex, with a 
> leading call to CPLMutexHolderD( &hRBMutex );
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130114/dad8248b/attachment.html>


More information about the gdal-dev mailing list