[gdal-dev] does gdal support multiple simultaneous writers to raster

Jan Hartmann j.l.h.hartmann at uva.nl
Sat Jan 12 06:08:55 PST 2013


You probably know this, but there is an option to let gdalwarp use more 
cores: -wo NUM_THREADS=ALL_CPUS. It gives some improvement, but not 
really staggering. Splitting up operations over individual tiles would 
really fasten up things. Even if I use only one VM, I can define 32 
cores, and it would certainly be interesting to experiment with programs 
like MPI to integrate multiple VMs into one computing cluster.

Jan

On 01/12/2013 02:38 AM, Kennedy, Paul wrote:
> Hi,
> Yes, we are pretty sure we will see a significant benefit.  The 
> processing algorithms are CPU bound not io bound. Our digital terrain 
> model interpolations often run for many hours ( we do them overnight) 
> but the underlying file is only a few gigabytes. If we split them into 
> multiple files of tiles and run each on a dedicated process the whole 
> thing is quicker, but this is messy and results in a stitching error.
>
> Another example is gdalwarp. It takes quite some time with a large 
> data set and would be. A good candidate for parallelisation, as would 
> gdaladdo.
>
> I believe slower cores but more of them in pcs are the future. My pc 
> has 8 but they rarely get used to their potential.
>
> I am certain there are some challenges here, that's why it is 
> interesting;)
>
> Regards
> pk
>
> On 11/01/2013, at 6:54 PM, "Even Rouault" 
> <even.rouault at mines-paris.org <mailto:even.rouault at mines-paris.org>> 
> wrote:
>
>> Re: [gdal-dev] does gdal support multiple simultaneous writers to raster
>>
>> Hi,
>>
>> This is an intersting topic, with many "intersecting" issues to deal 
>> with at
>> different levels.
>>
>> First, are you confident that in the use cases you imagine that I/O 
>> access won't
>> be the limiting factor, in which case serialization of I/O could be 
>> acceptable
>> and this would just require an API with a dataset level mutex.
>>
>> There are several places where parallel write should be addressed :
>> - The GDAL core mechanisms that deal with the block cache
>> - Each GDAL driver where parallel write would be supported. I guess 
>> that GDAL
>> drivers should advertize a specific capability
>> - The low-level library used by the driver. In the case of GDAL, libtiff
>>
>> And finally, as Frank underlined, there are intrinsic limitations due 
>> to the
>> format itself. For a compressed TIFF, at some point, you have to 
>> serialize the
>> writing of the tile, because you cannot kown in advance the size of the
>> compressed data, or at least have some coordination of the writers so 
>> that a
>> "next offset available" is properly synchronized between them. The 
>> compression
>> itself could be serialized.
>>
>> I'm not sure however if what Jan mentionned, different process, 
>> writing the same
>> dataset is doable.
>>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130112/1f3ac051/attachment-0001.html>


More information about the gdal-dev mailing list