[gdal-dev] does gdal support multiple simultaneous writers to raster

Joaquim Luis jluis at ualg.pt
Fri Jan 11 18:03:51 PST 2013


On 12-01-2013 01:38, Kennedy, Paul wrote:
> Hi,
> Yes, we are pretty sure we will see a significant benefit.  The 
> processing algorithms are CPU bound not io bound. Our digital terrain 
> model interpolations often run for many hours ( we do them overnight) 
> but the underlying file is only a few gigabytes. If we split them into 
> multiple files of tiles and run each on a dedicated process the whole 
> thing is quicker, but this is messy and results in a stitching error.

Some many years ago when I had to do that type of operations due to 
memory limitations the trick was to compute each tile larger than 
needed. Let say 10% wider in each of the 4 sides (except the borders of 
course). The extra zone will work as a boundary condition and is 
stripped at the end. The stripped tiles could than be pasted together to 
build the final mosaic. I did that with minimum curvature (GMT) 
interpolation and the final 'gluing' resulted perfect as it couldn't be 
noticed not even with shaded illumination.

Joaquim

>
> Another example is gdalwarp. It takes quite some time with a large 
> data set and would be. A good candidate for parallelisation, as would 
> gdaladdo.
>
> I believe slower cores but more of them in pcs are the future. My pc 
> has 8 but they rarely get used to their potential.
>
> I am certain there are some challenges here, that's why it is 
> interesting;)
>
> Regards
> pk
>
> On 11/01/2013, at 6:54 PM, "Even Rouault" 
> <even.rouault at mines-paris.org <mailto:even.rouault at mines-paris.org>> 
> wrote:
>
>> Hi,
>>
>> This is an intersting topic, with many "intersecting" issues to deal 
>> with at
>> different levels.
>>
>> First, are you confident that in the use cases you imagine that I/O 
>> access won't
>> be the limiting factor, in which case serialization of I/O could be 
>> acceptable
>> and this would just require an API with a dataset level mutex.
>>
>> There are several places where parallel write should be addressed :
>> - The GDAL core mechanisms that deal with the block cache
>> - Each GDAL driver where parallel write would be supported. I guess 
>> that GDAL
>> drivers should advertize a specific capability
>> - The low-level library used by the driver. In the case of GDAL, libtiff
>>
>> And finally, as Frank underlined, there are intrinsic limitations due 
>> to the
>> format itself. For a compressed TIFF, at some point, you have to 
>> serialize the
>> writing of the tile, because you cannot kown in advance the size of the
>> compressed data, or at least have some coordination of the writers so 
>> that a
>> "next offset available" is properly synchronized between them. The 
>> compression
>> itself could be serialized.
>>
>> I'm not sure however if what Jan mentionned, different process, 
>> writing the same
>> dataset is doable.
>>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130112/51788f16/attachment.html>


More information about the gdal-dev mailing list