[gdal-dev] does gdal support multiple simultaneous writers to raster
Jan Hartmann
j.l.h.hartmann at uva.nl
Sat Jan 12 06:08:55 PST 2013
You probably know this, but there is an option to let gdalwarp use more
cores: -wo NUM_THREADS=ALL_CPUS. It gives some improvement, but not
really staggering. Splitting up operations over individual tiles would
really fasten up things. Even if I use only one VM, I can define 32
cores, and it would certainly be interesting to experiment with programs
like MPI to integrate multiple VMs into one computing cluster.
Jan
On 01/12/2013 02:38 AM, Kennedy, Paul wrote:
> Hi,
> Yes, we are pretty sure we will see a significant benefit. The
> processing algorithms are CPU bound not io bound. Our digital terrain
> model interpolations often run for many hours ( we do them overnight)
> but the underlying file is only a few gigabytes. If we split them into
> multiple files of tiles and run each on a dedicated process the whole
> thing is quicker, but this is messy and results in a stitching error.
>
> Another example is gdalwarp. It takes quite some time with a large
> data set and would be. A good candidate for parallelisation, as would
> gdaladdo.
>
> I believe slower cores but more of them in pcs are the future. My pc
> has 8 but they rarely get used to their potential.
>
> I am certain there are some challenges here, that's why it is
> interesting;)
>
> Regards
> pk
>
> On 11/01/2013, at 6:54 PM, "Even Rouault"
> <even.rouault at mines-paris.org <mailto:even.rouault at mines-paris.org>>
> wrote:
>
>> Re: [gdal-dev] does gdal support multiple simultaneous writers to raster
>>
>> Hi,
>>
>> This is an intersting topic, with many "intersecting" issues to deal
>> with at
>> different levels.
>>
>> First, are you confident that in the use cases you imagine that I/O
>> access won't
>> be the limiting factor, in which case serialization of I/O could be
>> acceptable
>> and this would just require an API with a dataset level mutex.
>>
>> There are several places where parallel write should be addressed :
>> - The GDAL core mechanisms that deal with the block cache
>> - Each GDAL driver where parallel write would be supported. I guess
>> that GDAL
>> drivers should advertize a specific capability
>> - The low-level library used by the driver. In the case of GDAL, libtiff
>>
>> And finally, as Frank underlined, there are intrinsic limitations due
>> to the
>> format itself. For a compressed TIFF, at some point, you have to
>> serialize the
>> writing of the tile, because you cannot kown in advance the size of the
>> compressed data, or at least have some coordination of the writers so
>> that a
>> "next offset available" is properly synchronized between them. The
>> compression
>> itself could be serialized.
>>
>> I'm not sure however if what Jan mentionned, different process,
>> writing the same
>> dataset is doable.
>>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130112/1f3ac051/attachment-0001.html>
More information about the gdal-dev
mailing list