[gdal-dev] RFC 47 and Threading
Even Rouault
even.rouault at spatialys.com
Fri Aug 22 13:12:57 PDT 2014
Le vendredi 22 août 2014 21:50:56, Blake Thompson a écrit :
> Even,
>
> On Fri, Aug 22, 2014 at 1:33 PM, Even Rouault <even.rouault at spatialys.com>
>
> wrote:
> > Note: after re-reading, I realize that I misread your above sentence as
> > "it will work to translate multiple datasets in a parallel way with a
> > *per- dataset* cache").
> >
> > So, even if you didn't write it, I'm afraid that people will assume that
> > calling CreateCopy() on the same source dataset handle would be
> > thread-safe (imagine that one thread translates to format F1, while the
> > other one to format F2), whereas in the current state of the RFC it is
> > not. Because CreateCopy() will call GetGeoTransform(),
> > GetProjectionRef() etc which are generally thread unsafe. For example
> > GTiff has a lazy loading approach for those 2 methods, and it is not the
> > only one.
>
> If we claim thread-safety, we should likely offer full thread-safety (at
>
> > least
> > in reading scenarios), not partial one. Otherwise I'm afraid no one but
> > the few people that have taken part to that discussion or read the RFC
> > will know
> > the limits.
>
> Perhaps the goal of the RFC should be to allow CreateCopy to be threadsafe?
> I have no tried yet to venture into other parts of thread safety on the
> dataset (already bit off a very large chunk).
>
> > I'm wondering if it wouldn't be worth having GDALDatasetThreadSafe and
> > GDALRasterBandThreadSafe classes (or whatever name is appropriate), that
> > would
> > follow the decorator pattern, i.e. they will own a thread unsafe "real"
> > dataset/band and override the methods to lock them. Similarly to what I
> > have
> > done in OGR with ogr/ogrsf_frmts/generic/ogrmutexeddatasource.cpp and
> > ogrmutexedlayer.cpp, needed for the FGDB driver (the one that depends on
> > the
> > ESRI SDK).
> >
> > My idea would be to have an open flag GDAL_OF_THREADSAFE for
> > GDALOpenEx(). When set, GDALOpen() would do the usual job and get a (in
> > most cases) unsafe
> > dataset object. Then it would query a virtual method of the dataset to
> > return
> > a thread-safe version of it (GetThreadSafe()).
> > - If not defined by the driver, the base implementation of
> > GetThreadSafe() would return the dataset wrapped in
> > GDALDatasetThreadSafe
> > - If the implementation of the dataset is already thread-safe, it's
> > GetThreadSafe() would return just self
> > - For the in-between situations, it could for example override
> > GDALDatasetThreadSafe/GDALRasterBandThreadSafe base implementation to
> > specialize the methods that don't need locking.
> >
> > This idea might not work (and I've not though how it would combine with
> > my previous IReadBlock_thread_safe/IReadBlock approach). I have just
> > written it
> > as it came to my mind. The main motivation is to make it easy to have
> > thread-
> > safe versions by default, without the driver having to care about that,
> > while
> > being flexible to make drivers that need finer control to do it.
>
> This approach worries me, there is already a lot of inheritance of the
> GDALDataset, its really hard to learn already.
Well, the learning phase would be only be people who want to tweak thread
safety of their drivers, i.e. not a lot of people I bet. The user wouldn't
have to pay a lot (just specifying GDAL_OF_THREADSAFE, or nothing at all if
the price of thread safety is not too big)
> Adding another layer would
> make it even harder to maintain in my opinion. It also provides one more
> config option in an already overwhelming number of config options. I think
> we might be better off if we made a long term plan on how we wanted to make
> GDAL thread safe. Perhaps make a list of parts needing thread safety and
> progress towards it over a period of time. As part of it maintain a webpage
> that describes what is thread safe currently, so people are not confused on
> what is and is not thread safe.
People are already confused... and have a hard-time really understanding the
implication of the few lines at
http://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe
I'm afraid more documentation of a more complex situation will not make it
better. It is not reasonable to have to document that "GetProjectionRef() is
thread-safe in drivers X and Y, whereas GetGeoTransform() is in Y and Z".
Because that's the situation we will have for sure if we don't adopt a global
default mechanism (unless someone spends weeks in adding locks in every
virtual method of every driver).
The external locking I suggested is not completely a new approach. I could
mention Collections.synchronizedList() used in Java to make their List
implementations thread safe... Sorry for mentionning Java ;-)
The ideal would be to have a solution were by default we have a guarantee
thread-safety and people (GDAL driver developers) who want to tune it can do
it if they are brave enough.
>
> > If, through benchmark, we determine that the cost of the thread safe
> > version
> > is neglectable, then GDAL_OF_THREADSAFE might be useless, and GDALOpen()
> > would
> > always return the thread-safe version.
>
> I wish there were lots of benchmarks in the test suite (if there are, I
> don't know where they are),
They don't yet exist. Most code is functionnal code in Python, so not
appropriate for performance testing, especially regarding threading.
The autotest/cpp should likely be enhanced to have such tests (the only
performance benchmark I can think of is testperfcopywords that was created to
justify the templated approach of GDALCopyWords)
To be mentionned, there's also apps/multireadtest.cpp (not compiled by
default). It just ensures that opening the same file in N dataset objects and
reading them in N threads works.
> so that we can even roughly measure changes to
> the internals of the library. (I realize this is something that could be
> difficult to do well).
>
> Thanks,
>
> Blake
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list