[gdal-dev] RFC 47 and Threading

Fri Aug 22 13:46:45 PDT 2014

Le vendredi 22 août 2014 22:17:37, Blake Thompson a écrit :
> Even,
> 
> The naming is perhaps not well choosen, but the documentation of the
> 
> > contract
> > of this API should make it clear on what an implementation should do and
> > what
> > it should not do.
> 
> Perhaps it should be reversed? IReadBlock and IReadBlock_not_thread_safe?
> (would obviously require lots of changes that way).

Yeah, that why was I suggested the reverse ;-) But an automatic renaming 
should be doable.
Another danger of keeping IReadBlock and making it now responsible of ensuring 
its thread safety is for out-of-tree drivers that wouldn't realize they need 
to change something as code would still compile (but it is their business to 
follow GDAL RFCs isn't it ?...)
If we were nice, we would have a dummy IReadBlock() in the interface with a 
change of signature to trigger compilation errors of non-upgraded drivers, and 
IReadBlock_thread_safe, IReadBlock_not_thread_safe.

> 
> > You didn't answer my question on how many drivers can have a non-blocking
> > IReadBlock() implementation in practice. My intutition (reinforced by
> > skimming
> > through https://github.com/OSGeo/gdal/pull/39/files) is just/mostly
> > MEMDataset.
> > So people looking at drivers would just see the classical IReadBlock()
> > implementation and wouldn't have to think a lot about thread safety (that
> > would be brought by the generic
> > GDALRasterBand::IReadBlock_thread_safe()).
> 
> I honestly have no idea how many drivers can have a non-blocking IReadBlock
> with this setup. I really do not understand all the drivers well enough to
> give an estimate (so I would assume yours is probably most correct). 

See one of my previous answers about the fundamental reason why most 
IReadBlock() implementations need a lock around their slowest part : being it 
file access or database access. Honestly, it is not worth just locking the part 
that takes most of the time instead of locking the whole method. It could be 
worth when decompression is involved for example, but in all cases I can think 
of (jpeg2000, jpeg, png, libtiff, ecw, sid, etc...) the I/O part and 
decompression part are done in a single call from GDAL point of view.

> I have
> been mostly concerned with first just making the cache threadsafe, then was
> thinking we could start targeting drivers individually.

The truth is that the majority of drivers are esoteric, whereas perhaps just 
10 are used 90% of the time. So it would be a waste of energy to have to spend 
too much time on the former ones.

> 
> > > I somewhat like that idea that drivers
> > > should consider their thread safety.
> > 
> > I somewhat like the idea they shouldn't have to bother about that ;-)
> > especially if we can have improvements in core only that improve
> > thread-safe
> > usage and performance. The people wanting to gain one more extra % of
> > performance might take the risk of doing a custom thread-safe
> > implementation
> > in the driver.
> 
> My thought here was more along the lines of "the driver is going to write
> to the block cache so it should have the responsibility to lock when it
> needs to be locked". To me this was better in some manners then "in most
> case implement this method, which if I use improperly might cause deadlocks
> or corruptions, otherwise implement this method".

To be pedant, IReadBlock() can be called with any user buffer, not only the one 
of a GDALRasterBlock. Driver writers can currently write a driver without 
knowing much about the block cache mechanism. They just need to know "there's 
something in core that will magically ensure that any RasterIO() is 
transformed in the right sequence of call to my simple IReadBlock()"

Not that I worry much about driver writers, but it seems more elegant to me if 
complexity is in the core rather than popping up in each and every driver.

> 
> Thanks,
> 
> Blake

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com