[gdal-dev] RFC 47 and Threading

Thu Aug 28 10:10:25 PDT 2014

Even,

> Yeah, that why was I suggested the reverse ;-) But an automatic renaming
> should be doable.
> Another danger of keeping IReadBlock and making it now responsible of
> ensuring
> its thread safety is for out-of-tree drivers that wouldn't realize they
> need
> to change something as code would still compile (but it is their business
> to
> follow GDAL RFCs isn't it ?...)
> If we were nice, we would have a dummy IReadBlock() in the interface with a
> change of signature to trigger compilation errors of non-upgraded drivers,
> and
> IReadBlock_thread_safe, IReadBlock_not_thread_safe.
>
> I have spent quite a bit of time thinking about how I would like to
implement it all to be very clear to the driver developers what is going on
here. I really do not like the term "thread safe" in many ways to describe
the IReadBlock, because what it really is in this instance is "thread safe
for the cache". Many people will think it is about making it "thread safe
for the driver" which it is not dealing with. I spent a lot of effort to
try to make every driver thread safe for the cache in solution #2, and in
some ways adding the "not_thread_safe" option seems like undoing a lot of
the work I put forth. (In hindsight I should have done that it would have
been much easier). I do think there should be compile warnings if someone
has not updated their IReadBlock to include the phMutex parameter.

> >
> > > You didn't answer my question on how many drivers can have a
> non-blocking
> > > IReadBlock() implementation in practice. My intutition (reinforced by
> > > skimming
> > > through https://github.com/OSGeo/gdal/pull/39/files) is just/mostly
> > > MEMDataset.
> > > So people looking at drivers would just see the classical IReadBlock()
> > > implementation and wouldn't have to think a lot about thread safety
> (that
> > > would be brought by the generic
> > > GDALRasterBand::IReadBlock_thread_safe()).
> >
> > I honestly have no idea how many drivers can have a non-blocking
> IReadBlock
> > with this setup. I really do not understand all the drivers well enough
> to
> > give an estimate (so I would assume yours is probably most correct).
>
> See one of my previous answers about the fundamental reason why most
> IReadBlock() implementations need a lock around their slowest part : being
> it
> file access or database access. Honestly, it is not worth just locking the
> part
> that takes most of the time instead of locking the whole method. It could
> be
> worth when decompression is involved for example, but in all cases I can
> think
> of (jpeg2000, jpeg, png, libtiff, ecw, sid, etc...) the I/O part and
> decompression part are done in a single call from GDAL point of view.
>

Re-reading a lot of our discussions, I feel like that making a
"non-blocking IReadBlock" that works for the driver should really be more
handled by the other discussion we were having about a having a read-only
safe dataset (other then Mem). In which case I am really starting to like
the idea of a default thread-safe wrapper, and then driver specific safe
implementations.

>
> > I have
> > been mostly concerned with first just making the cache threadsafe, then
> was
> > thinking we could start targeting drivers individually.
>
> The truth is that the majority of drivers are esoteric, whereas perhaps
> just
> 10 are used 90% of the time. So it would be a waste of energy to have to
> spend
> too much time on the former ones.
>
> >
> > > > I somewhat like that idea that drivers
> > > > should consider their thread safety.
> > >
> > > I somewhat like the idea they shouldn't have to bother about that ;-)
> > > especially if we can have improvements in core only that improve
> > > thread-safe
> > > usage and performance. The people wanting to gain one more extra % of
> > > performance might take the risk of doing a custom thread-safe
> > > implementation
> > > in the driver.
> >
> > My thought here was more along the lines of "the driver is going to write
> > to the block cache so it should have the responsibility to lock when it
> > needs to be locked". To me this was better in some manners then "in most
> > case implement this method, which if I use improperly might cause
> deadlocks
> > or corruptions, otherwise implement this method".
>
> To be pedant, IReadBlock() can be called with any user buffer, not only
> the one
> of a GDALRasterBlock. Driver writers can currently write a driver without
> knowing much about the block cache mechanism. They just need to know
> "there's
> something in core that will magically ensure that any RasterIO() is
> transformed in the right sequence of call to my simple IReadBlock()"
>
> Not that I worry much about driver writers, but it seems more elegant to
> me if
> complexity is in the core rather than popping up in each and every driver.

I agree that it would be more elegant if it wasn't as big of a problem. It
seems that lots of drivers have pixel interleaving and or call IRasterIO or
other edge cases so they need to implement the special method anyhow to
protect the cache. It might be more elegant, but are we making it more
elegant for just the other drivers that aren't in the 90%? How many of the
90% of the time used drivers use pixel intervealed methods?

As well I can certainly go back through and change many of the drivers to
use a common way of just locking to protect the data passed in IReadBlock
etc, but I would still have to rename their current IReadBlock methods to
IReadBlockNotCacheThreadSafe or something. This still makes the amount of
changes I introduce with this just as large. I am willing to do this, but
not convinced still that it really helps a lot

Thanks,

Blake
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140828/dce5014a/attachment.html>