[gdal-dev] does gdal support multiple simultaneous writers to raster

Paul Meems bontepaarden at gmail.com
Sat Jan 12 06:20:13 PST 2013


To add my 2cnts.

With MapWindow GIS we use TauDEM binaries to perform watershed delineations.
http://hydrology.usu.edu/taudem/taudem5.0/index.html

These TauDEM binaries are optimized to use MPI, but also work if you don't
have MPI installed.
I don't know in detail how it works but in general you set how much
parallel processes you want and TauDEM clips your input geotiff in that
amount of pieces, with some overlap.
Next it processes each clipped tiff in parallel and combines the results
afterwards.
This works very well and very fast.
Perhaps something like this can be introduced for GDAL. The source code for
the TauDEM binaries are available so you can have a look how its done.

--
Paul


2013/1/12 Jan Hartmann <j.l.h.hartmann at uva.nl>

>  You probably know this, but there is an option to let gdalwarp use more
> cores: -wo NUM_THREADS=ALL_CPUS. It gives some improvement, but not really
> staggering. Splitting up operations over individual tiles would really
> fasten up things. Even if I use only one VM, I can define 32 cores, and it
> would certainly be interesting to experiment with programs like MPI to
> integrate multiple VMs into one computing cluster.
>
> Jan
>
>  On 01/12/2013 02:38 AM, Kennedy, Paul wrote:
>
> Hi,
> Yes, we are pretty sure we will see a significant benefit.  The processing
> algorithms are CPU bound not io bound. Our digital terrain model
> interpolations often run for many hours ( we do them overnight) but the
> underlying file is only a few gigabytes. If we split them into multiple
> files of tiles and run each on a dedicated process the whole thing is
> quicker, but this is messy and results in a stitching error.
>
>  Another example is gdalwarp. It takes quite some time with a large data
> set and would be. A good candidate for parallelisation, as would gdaladdo.
>
>  I believe slower cores but more of them in pcs are the future. My pc has
> 8 but they rarely get used to their potential.
>
> I am certain there are some challenges here, that's why it is interesting;)
>
> Regards
> pk
>
> On 11/01/2013, at 6:54 PM, "Even Rouault" <even.rouault at mines-paris.org>
> wrote:
>
>   Hi,
>
> This is an intersting topic, with many "intersecting" issues to deal with
> at
> different levels.
>
> First, are you confident that in the use cases you imagine that I/O access
> won't
> be the limiting factor, in which case serialization of I/O could be
> acceptable
> and this would just require an API with a dataset level mutex.
>
> There are several places where parallel write should be addressed :
> - The GDAL core mechanisms that deal with the block cache
> - Each GDAL driver where parallel write would be supported. I guess that
> GDAL
> drivers should advertize a specific capability
> - The low-level library used by the driver. In the case of GDAL, libtiff
>
> And finally, as Frank underlined, there are intrinsic limitations due to
> the
> format itself. For a compressed TIFF, at some point, you have to serialize
> the
> writing of the tile, because you cannot kown in advance the size of the
> compressed data, or at least have some coordination of the writers so that
> a
> "next offset available" is properly synchronized between them. The
> compression
> itself could be serialized.
>
> I'm not sure however if what Jan mentionned, different process, writing
> the same
> dataset is doable.
>
>
>
> _______________________________________________
> gdal-dev mailing listgdal-dev at lists.osgeo.orghttp://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130112/1acac94a/attachment.html>


More information about the gdal-dev mailing list