[gdal-dev] RFC 47 and Threading

Even Rouault even.rouault at spatialys.com
Fri Aug 22 13:12:57 PDT 2014


Le vendredi 22 août 2014 21:50:56, Blake Thompson a écrit :
> Even,
> 
> On Fri, Aug 22, 2014 at 1:33 PM, Even Rouault <even.rouault at spatialys.com>
> 
> wrote:
> > Note: after re-reading, I realize that I misread your above sentence as
> > "it will work to translate multiple datasets in a parallel way with a
> > *per- dataset* cache").
> > 
> > So, even if you didn't write it, I'm afraid that people will assume that
> > calling CreateCopy() on the same source dataset handle would be
> > thread-safe (imagine that one thread translates to format F1, while the
> > other one to format F2), whereas in the current state of the RFC it is
> > not. Because CreateCopy() will call GetGeoTransform(),
> > GetProjectionRef() etc which are generally thread unsafe. For example
> > GTiff has a lazy loading approach for those 2 methods, and it is not the
> > only one.
> 
> If we claim thread-safety, we should likely offer full thread-safety (at
> 
> > least
> > in reading scenarios), not partial one. Otherwise I'm afraid no one but
> > the few people that have taken part to that discussion or read the RFC
> > will know
> > the limits.
> 
> Perhaps the goal of the RFC should be to allow CreateCopy to be threadsafe?
> I have no tried yet to venture into other parts of thread safety on the
> dataset (already bit off a very large chunk).
> 
> > I'm wondering if it wouldn't be worth having GDALDatasetThreadSafe and
> > GDALRasterBandThreadSafe classes (or whatever name is appropriate), that
> > would
> > follow the decorator pattern, i.e. they will own a thread unsafe "real"
> > dataset/band and override the methods to lock them. Similarly to what I
> > have
> > done in OGR with ogr/ogrsf_frmts/generic/ogrmutexeddatasource.cpp and
> > ogrmutexedlayer.cpp, needed for the FGDB driver (the one that depends on
> > the
> > ESRI SDK).
> > 
> > My idea would be to have an open flag GDAL_OF_THREADSAFE for
> > GDALOpenEx(). When set, GDALOpen() would do the usual job and get a (in
> > most cases) unsafe
> > dataset object. Then it would query a virtual method of the dataset to
> > return
> > a thread-safe version of it (GetThreadSafe()).
> > - If not defined by the driver, the base implementation of
> > GetThreadSafe() would return the dataset wrapped in
> > GDALDatasetThreadSafe
> > - If the implementation of the dataset is already thread-safe, it's
> > GetThreadSafe() would return just self
> > - For the in-between situations, it could for example override
> > GDALDatasetThreadSafe/GDALRasterBandThreadSafe base implementation to
> > specialize the methods that don't need locking.
> > 
> > This idea might not work (and I've not though how it would combine with
> > my previous IReadBlock_thread_safe/IReadBlock approach). I have just
> > written it
> > as it came to my mind. The main motivation is to make it easy to have
> > thread-
> > safe versions by default, without the driver having to care about that,
> > while
> > being flexible to make drivers that need finer control to do it.
> 
> This approach worries me, there is already a lot of inheritance of the
> GDALDataset, its really hard to learn already.

Well, the learning phase would be only be people who want to tweak thread 
safety of their drivers, i.e. not a lot of people I bet. The user wouldn't 
have to pay a lot (just specifying GDAL_OF_THREADSAFE, or nothing at all if 
the price of thread safety is not too big)

> Adding another layer would
> make it even harder to maintain in my opinion. It also provides one more
> config option in an already overwhelming number of config options. I think
> we might be better off if we made a long term plan on how we wanted to make
> GDAL thread safe. Perhaps make a list of parts needing thread safety and
> progress towards it over a period of time. As part of it maintain a webpage
> that describes what is thread safe currently, so people are not confused on
> what is and is not thread safe.

People are already confused... and have a hard-time really understanding the 
implication of the few lines at 
http://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe

I'm afraid more documentation of a more complex situation will not make it 
better. It is not reasonable to have to document that "GetProjectionRef() is 
thread-safe in drivers X and Y, whereas GetGeoTransform() is in Y and Z". 
Because that's the situation we will have for sure if we don't adopt a global 
default mechanism (unless someone spends weeks in adding locks in every 
virtual method of every driver).

The external locking I suggested is not completely a new approach. I could 
mention  Collections.synchronizedList() used in Java to make their List 
implementations thread safe... Sorry for mentionning Java ;-)

The ideal would be to have a solution were by default we have a guarantee 
thread-safety and people (GDAL driver developers) who want to tune it can do 
it if they are brave enough.

> 
> > If, through benchmark, we determine that the cost of the thread safe
> > version
> > is neglectable, then GDAL_OF_THREADSAFE might be useless, and GDALOpen()
> > would
> > always return the thread-safe version.
> 
> I wish there were lots of benchmarks in the test suite (if there are, I
> don't know where they are), 

They don't yet exist. Most code is functionnal code in Python, so not 
appropriate for performance testing, especially regarding threading.
The autotest/cpp should likely be enhanced to have such tests (the only 
performance benchmark I can think of is testperfcopywords that was created to 
justify the templated approach of GDALCopyWords)

To be mentionned, there's also apps/multireadtest.cpp (not compiled by 
default). It just ensures that opening the same file in N dataset objects and 
reading them in N threads works.

> so that we can even roughly measure changes to
> the internals of the library. (I realize this is something that could be
> difficult to do well).
> 
> Thanks,
> 
> Blake

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list