[gdal-dev] RFC 47 and Threading

Wed Aug 20 14:59:44 PDT 2014

Even,

Thank you very much for response to this, your helping me understand all of
this stuff has been very valuable.

I'm not sure the ratio gain/effort to make dataset methods "a bit more"
> thread
> safe is high enough. Very few drivers can be made fully thread-safe, and
> fully
> parallelized (i.e. without a big mutex). Basically all file-based drivers
> or
> database-based drivers cannot. The former ones because the file handle
> cannot
> be used simultaneously by different threads. The latter ones for the same
> reason with the database connection handles. So basically that would just
> address the case of the MEM driver (which is an important use case for
> Blake).
> And in that instance, it would already be possible to use several MEM
> dataset
> handles that would point to the same memory buffer (and with a tiny
> per-dataset
> cache size since it is not really needed for a fast datasource such as MEM)
>

I sadly agree with that still most of the drivers will not be thread-safe
however, I don't see any way of getting away from this problem with out
some big changes to the way the core library utilizes data provided by the
driver (particularly the cache). This is partially why the changes are so
vast. However, I want to state that once a dataset is loaded into the cache
via the driver it can be effectively and quickly used in a multi-threaded
way. This would be ideal for many situations where the cache is large
enough, especially with the changes for a "per dataset block cache". Take
for example where you are attempting to cut many small datasets from a
single large shared dataset. If the block cache was not thrashing, the
single large shared dataset could easily be utilized in multiple threads
with great performance increases once all data in it was loaded into the
cache. Perhaps I would be better off stating that this change does it best
to make the cache very quick and safe in threading.

> On the other side, I would be very pleased with having "just" the
> preliminary
> step of Blake's work, i.e. the possibility to choose a per-band block cache
> strategy instead of the global block cache. That should already address
> most
> scaling issues.
>

First off wanted to state that it attempts to be a per dataset block cache,
not a per band block cache, and only after setting a global configuration
value. Its not a huge difference, but is something worth noting.

I know that you are concerned about there being three different mutexes,
however, during testing of some different datasets, I was trying to put
some value on the potential speed increases provided by multi-threading.
Simply put larger scope mutexes provide much more locking and don't provide
much improvements in performacne. The three mutexes were specifically
placed to protect different sets of data and for the locking to take place
only when these areas were at risk of corruption.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140820/0fd4db2b/attachment.html>