[gdal-dev] RFC 47 and Threading

Fri Aug 22 12:55:34 PDT 2014

Le vendredi 22 août 2014 21:34:49, Blake Thompson a écrit :
> Even,
> 
> > This might be a problem in practice since the amount of cache might grow
> > quickly. By default a VRT will open simultaneously up to
> > GDAL_MAX_DATASET_POOL_SIZE (whose default is 100, but can be changed at
> > runtime with the config option) source datasets.
> > If you do a gdal_translate of the VRT (and that it has more than 100
> > sources)
> > then you will need 100 * GDAL_CACHEMAX bytes for the cache.
> > Of course once you are aware of that you can play with both configuration
> > options to find the best settings.
> 
> This is one of the reasons I would perfer to use my solution #1 or #2
> rather then just adding the per dataset cache as we originally spoke about
> during the original development. 

Ok.

> I think we should try to preserve the
> global cache in some situations but still have good performance.
> 
> > Just throwing an idea : how about a IReadBlock_thread_safe and IReadBlock
> > virtual methods ? The first one would have a default implementation in
> > GDALRasterBand that would basically call the second one with a global
> > mutex.
> > That way drivers that wan't a more fine grained locking could implement
> > their
> > own IReadBlock_thread_safe, whereas the ones that don't care would just
> > implement IReadBlock. The few current call sites of IReadBlock would just
> > need
> > to be updated to call IReadBlock_thread_safe instead.
> 
> I originally thought about such a solution, but I decided instead on the
> using of the default IReadBlock paramter  'void ** phMutex = NULL'. If the
> pointer pointer of the mutex is NULL, then no mutex is created and locking
> does not occur. Honestly as well when I was originally doing this, I was
> not sure which drivers would have even needed such a IReadBlock_thread_safe
> implementation. Therefore, I decide the best way in Solution #2 was to
> simply just walk through each of the implementations of and find where
> exactly I need to protect the data pointer.
> 
> I have a mixed feelings on both methods. I feel like the
> IReadBlock_thread_safe implementation might make people writing new drivers
> feel like they do not have to work about thread safety when they are
> comparing the code to other drivers?

The naming is perhaps not well choosen, but the documentation of the contract 
of this API should make it clear on what an implementation should do and what 
it should not do.

You didn't answer my question on how many drivers can have a non-blocking 
IReadBlock() implementation in practice. My intutition (reinforced by skimming 
through https://github.com/OSGeo/gdal/pull/39/files) is just/mostly MEMDataset. 
So people looking at drivers would just see the classical IReadBlock() 
implementation and wouldn't have to think a lot about thread safety (that 
would be brought by the generic GDALRasterBand::IReadBlock_thread_safe()).

> I somewhat like that idea that drivers
> should consider their thread safety.

I somewhat like the idea they shouldn't have to bother about that ;-) 
especially if we can have improvements in core only that improve thread-safe 
usage and performance. The people wanting to gain one more extra % of 
performance might take the risk of doing a custom thread-safe implementation 
in the driver.

> Lots of drivers seem to call RasterIO
> as well, which would that need a RasterIO_thread_safe method?

Yes, likely.

> That might
> make that code more complex? Not quite sure, will have to think it all out.

Well, you would implement either the classical or _thread_safe version, not 
both. But there are perhaps practical reasons for which this suggestion 
wouldn't work.

> 
> Thanks,
> 
> Blake

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com