[gdal-dev] does gdal support multiple simultaneous writers to raster

Kennedy, Paul P.Kennedy at fugro.com.au
Sat Jan 12 18:24:27 PST 2013


Hi
Simultaneous writers would be a better long term solution as we often improve the raster following initial creation. This improvement may well be a filter run on a sub region ( eg a despeckle) or updating a piece of the dtm with better information or even some manual edits as a last resort. 
I can imagine a Hadoop style map/reduce would fit nicely into your sub window idea. 

Regards
pk

On 12/01/2013, at 11:16 PM, "Even Rouault" <even.rouault at mines-paris.org> wrote:

> Le samedi 12 janvier 2013 02:38:55, Kennedy, Paul a écrit :
> > Hi,
> > Yes, we are pretty sure we will see a significant benefit.  The processing
> > algorithms are CPU bound not io bound. Our digital terrain model
> > interpolations often run for many hours ( we do them overnight) but the
> > underlying file is only a few gigabytes.
> 
> OK, my understanding is that you don't really need writers to write
> simultaneously. You need to compute tiles or subwindows of the whole raster in
> parallel, but the writing itself of the result of that computation could be
> well done in a serialized way.
> 
> That's a bit what is done with gdalwarp  -wo NUM_THREADS=xxxx . Having
> parallelized I/O could perhaps give some extra performance when you have so
> many threads that the time spent in I/O becomes of the same order of magnitude
> than the time spent in computing, but at the expense of probably a significant
> complexity in GDAL core and drivers.
> 
> > If we split them into multiple
> > files of tiles and run each on a dedicated process the whole thing is
> > quicker, but this is messy and results in a stitching error.
> >
> > Another example is gdalwarp. It takes quite some time with a large data set
> > and would be. A good candidate for parallelisation, as would gdaladdo.
> >
> > I believe slower cores but more of them in pcs are the future. My pc has 8
> > but they rarely get used to their potential.
> >
> > I am certain there are some challenges here, that's why it is interesting;)
> >
> > Regards
> > pk
> >
> > On 11/01/2013, at 6:54 PM, "Even Rouault" <even.rouault at mines-paris.org>
> wrote:
> > > Hi,
> > >
> > > This is an intersting topic, with many "intersecting" issues to deal with
> > > at different levels.
> > >
> > > First, are you confident that in the use cases you imagine that I/O
> > > access won't be the limiting factor, in which case serialization of I/O
> > > could be acceptable and this would just require an API with a dataset
> > > level mutex.
> > >
> > > There are several places where parallel write should be addressed :
> > > - The GDAL core mechanisms that deal with the block cache
> > > - Each GDAL driver where parallel write would be supported. I guess that
> > > GDAL drivers should advertize a specific capability
> > > - The low-level library used by the driver. In the case of GDAL, libtiff
> > >
> > > And finally, as Frank underlined, there are intrinsic limitations due to
> > > the format itself. For a compressed TIFF, at some point, you have to
> > > serialize the writing of the tile, because you cannot kown in advance
> > > the size of the compressed data, or at least have some coordination of
> > > the writers so that a "next offset available" is properly synchronized
> > > between them. The compression itself could be serialized.
> > >
> > > I'm not sure however if what Jan mentionned, different process, writing
> > > the same dataset is doable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130113/6fa700ad/attachment.html>


More information about the gdal-dev mailing list