[gdal-dev] RFC 47 and Threading
Even Rouault
even.rouault at spatialys.com
Fri Aug 22 11:33:16 PDT 2014
Le vendredi 22 août 2014 17:53:50, Blake Thompson a écrit :
> Jeff,
>
> Thanks Blake for the detailed response. I did not realize that I did not do
>
> > a reply all in my previous email I sent.
>
> Not an issue, glad you guys are interested in my changes.
>
> > --> I thought that this was not possible using current trunk GDAL because
> > of the global cache. At least the writing side using multiple threads can
> > cause issues.
> > See this response on a older topic I found in GDAL dev list:
> > http://lists.osgeo.org/pipermail/gdal-dev/2013-January/035215.html
> > The response there was for a similar question as the one i posted about
> > supporting of batch translating of dataset with this RFC .
>
> I do believe it is possible in the current trunk of GDAL (and I have
> written some code to do it with out seeing any issues). There is a "lock"
> on raster blocks currently that prevents them from being removed from the
> cache if they are currently be utilized.
>
> > --> Ok, I was under the wrong impression that in addition of a cache per
> > dataset there is still a cap of global cache that oversee the total cache
> > used by GDAL.
>
> This really isn't possible, because if something needs to be removed from
> the cache and there is nothing to remove from the current dataset, it would
> have to go to another dataset to lower the cache size. At this point the
> per dataset cache really has limited meaning, because a lot of the same
> issues will occur.
>
> --> Is this mean that in addition of the per dataset cache once can still
>
> > use the global cache (only not per dataset one) and have the scenario of
> > translating multiple datasets in a parallel way
> > works ok (without threading issues due to current implementation global
> > cache).
>
> Yes, it will work to translate multiple datasets in a parallel way with a
> global cache.
Note: after re-reading, I realize that I misread your above sentence as "it
will work to translate multiple datasets in a parallel way with a *per-
dataset* cache").
So, even if you didn't write it, I'm afraid that people will assume that
calling CreateCopy() on the same source dataset handle would be thread-safe
(imagine that one thread translates to format F1, while the other one to
format F2), whereas in the current state of the RFC it is not. Because
CreateCopy() will call GetGeoTransform(), GetProjectionRef() etc which are
generally thread unsafe. For example GTiff has a lazy loading approach for
those 2 methods, and it is not the only one.
If we claim thread-safety, we should likely offer full thread-safety (at least
in reading scenarios), not partial one. Otherwise I'm afraid no one but the
few people that have taken part to that discussion or read the RFC will know
the limits.
I'm wondering if it wouldn't be worth having GDALDatasetThreadSafe and
GDALRasterBandThreadSafe classes (or whatever name is appropriate), that would
follow the decorator pattern, i.e. they will own a thread unsafe "real"
dataset/band and override the methods to lock them. Similarly to what I have
done in OGR with ogr/ogrsf_frmts/generic/ogrmutexeddatasource.cpp and
ogrmutexedlayer.cpp, needed for the FGDB driver (the one that depends on the
ESRI SDK).
My idea would be to have an open flag GDAL_OF_THREADSAFE for GDALOpenEx().
When set, GDALOpen() would do the usual job and get a (in most cases) unsafe
dataset object. Then it would query a virtual method of the dataset to return
a thread-safe version of it (GetThreadSafe()).
- If not defined by the driver, the base implementation of GetThreadSafe()
would return the dataset wrapped in GDALDatasetThreadSafe
- If the implementation of the dataset is already thread-safe, it's
GetThreadSafe() would return just self
- For the in-between situations, it could for example override
GDALDatasetThreadSafe/GDALRasterBandThreadSafe base implementation to
specialize the methods that don't need locking.
This idea might not work (and I've not though how it would combine with my
previous IReadBlock_thread_safe/IReadBlock approach). I have just written it
as it came to my mind. The main motivation is to make it easy to have thread-
safe versions by default, without the driver having to care about that, while
being flexible to make drivers that need finer control to do it.
If, through benchmark, we determine that the cost of the thread safe version
is neglectable, then GDAL_OF_THREADSAFE might be useless, and GDALOpen() would
always return the thread-safe version.
>
> Thanks,
>
> Blake
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list