[gdal-dev] RFC 47 and Threading

Even Rouault even.rouault at spatialys.com
Fri Aug 22 11:33:16 PDT 2014


Le vendredi 22 août 2014 17:53:50, Blake Thompson a écrit :
> Jeff,
> 
> Thanks Blake for the detailed response. I did not realize that I did not do
> 
> > a reply all in my previous email I sent.
> 
> Not an issue, glad you guys are interested in my changes.
> 
> > --> I thought that this was not possible using current trunk GDAL because
> > of the global cache. At least the writing side using multiple threads can
> > cause issues.
> > See this response on a older topic I found in GDAL dev list:
> > http://lists.osgeo.org/pipermail/gdal-dev/2013-January/035215.html
> > The response there was for a similar question as the one i posted about
> > supporting of batch translating of dataset with this RFC .
> 
> I do believe it is possible in the current trunk of GDAL (and I have
> written some code to do it with out seeing any issues). There is a "lock"
> on raster blocks currently that prevents them from being removed from the
> cache if they are currently be utilized.
> 
> > --> Ok, I was under the wrong impression that in addition of  a cache per
> > dataset there is still a cap of global cache that oversee the total cache
> > used by GDAL.
> 
> This really isn't possible, because if something needs to be removed from
> the cache and there is nothing to remove from the current dataset, it would
> have to go to another dataset to lower the cache size. At this point the
> per dataset cache really has limited meaning, because a lot of the same
> issues will occur.
> 
> --> Is this mean that in addition of the per dataset cache once can still
> 
> > use the global cache (only not per dataset one) and have the scenario of
> > translating multiple datasets in a parallel way
> > works ok (without threading issues due to current implementation global
> > cache).
> 
> Yes, it will work to translate multiple datasets in a parallel way with a
> global cache.

Note: after re-reading, I realize that I misread your above sentence as "it 
will work to translate multiple datasets in a parallel way with a *per-
dataset* cache").

So, even if you didn't write it, I'm afraid that people will assume that 
calling CreateCopy() on the same source dataset handle would be thread-safe 
(imagine that one thread translates to format F1, while the other one to 
format F2), whereas in the current state of the RFC it is not. Because 
CreateCopy() will call GetGeoTransform(), GetProjectionRef() etc which are 
generally thread unsafe. For example GTiff has a lazy loading approach for 
those 2 methods, and it is not the only one.

If we claim thread-safety, we should likely offer full thread-safety (at least 
in reading scenarios), not partial one. Otherwise I'm afraid no one but the 
few people that have taken part to that discussion or read the RFC will know 
the limits.
I'm wondering if it wouldn't be worth having GDALDatasetThreadSafe and 
GDALRasterBandThreadSafe classes (or whatever name is appropriate), that would 
follow the decorator pattern, i.e. they will own a thread unsafe "real" 
dataset/band and override the methods to lock them. Similarly to what I have 
done in OGR with ogr/ogrsf_frmts/generic/ogrmutexeddatasource.cpp and 
ogrmutexedlayer.cpp, needed for the FGDB driver (the one that depends on the 
ESRI SDK).

My idea would be to have an open flag GDAL_OF_THREADSAFE for GDALOpenEx().
When set, GDALOpen() would do the usual job and get a (in most cases) unsafe 
dataset object. Then it would query a virtual method of the dataset to return 
a thread-safe version of it (GetThreadSafe()).
- If not defined by the driver, the base implementation of GetThreadSafe() 
would return the dataset wrapped in GDALDatasetThreadSafe
- If the implementation of the dataset is already thread-safe, it's 
GetThreadSafe() would return just self
- For the in-between situations, it could for example override 
GDALDatasetThreadSafe/GDALRasterBandThreadSafe base implementation to 
specialize the methods that don't need locking.

This idea might not work (and I've not though how it would combine with my 
previous IReadBlock_thread_safe/IReadBlock approach). I have just written it 
as it came to my mind. The main motivation is to make it easy to have thread-
safe versions by default, without the driver having to care about that, while 
being flexible to make drivers that need finer control to do it.

If, through benchmark, we determine that the cost of the thread safe version 
is neglectable, then GDAL_OF_THREADSAFE might be useless, and GDALOpen() would 
always return the thread-safe version.


> 
> Thanks,
> 
> Blake

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list