[gdal-dev] RFC 47 and Threading

Thu Aug 21 12:59:22 PDT 2014

On Thu, Aug 21, 2014 at 11:36 AM, Jeff Lacoste <jefflacostegdal at gmail.com>
wrote:

> Hi,
>
> Improving the thread safety of GDAL is a big improvement. I know this
> proposal is not claiming to 'fix' gdal thread safety but adress at least
> the cache safety. This is said, may be to help
> clarify more the proposal, we can state what the change would address and
> (may be more important) not address. Just to make it more specific about
> gain of caching per dataset instead of global one.
>

 Updated the RFC to hopefully reflect this better reflect some of this,
please send more questions as you have them.

>  Would this mean for ex. that batch translating datasets would gain from
> this and be done in a parallel way ? Since we can avoid avoid trashing a
> global trash.
> So instead of translation x datasets in a sequential manner now (with the
> proposed changes) this can be done in parallel ?
>

Currently it is possible to do batch translating of datasets, but it is not
efficient. The reason for this is that there is a point where each thread
is attempting to add or remove gdalrasterblocks from the global cache. Once
the global cache size is reached all the threads are often blocked for
extended periods as each one attempts to clear the cache. The worst part is
that this also prevents simple reading from the cache during this period,
because other blocks can not efficiently use
GDALRasterBand::TryGetLockedBlockRef, to pull existing blocks from the
cache.

The simple and first fix I made while doing this was simply to make it
possible to operate with a per dataset cache (this was not selected as per
band due to possible deadlock issues). Doing this allows for each dataset
to have its own lock for its cache and the performance increases
dramatically for operating on different datasets in parallel.

>
> If yes, what would the cache flushing strategy once the cach max is
> reached  ? For ex. 5 running threads converting 5 datasets. We reach the
> max cache while the 5 threads are executing, this mean that threads will be
> blocked from executing as no cache is available until it has been released
> by other threads ?
>

> Jeff Lacoste
>

In the case I described above each dataset has its own max cache.
Therefore, reaching the maximum cache can only occur in a single dataset.
Therefore if the maximum cache was reached on that cache, it would flush
and not prevent other datasets and threads from operating on their own
caches. However, this means that the person writing the code MUST be aware
of the total cache sizes of all their datasets (I know this isn't ideal for
many people).

So I decided to take it a step farther and wanted to make it possible for a
dataset using a driver such as memory to be able to do something such as: 5
running threads translating 1 dataset, 5 times. Also I wanted to be able to
use one global cache and make it so that, the 5 threads translating 5
datasets would not lock as often with a global cache. To achieve this a
separation of concerns had to occur, and this lead to the development of
multiple mutexes.

Three different data structure are at risk during threading within the
cache:

#1 - The Linked List of the cache and size of this linked list (this allows
the cache to flush the least recently used GDALRasterBlock)
#2 - The cache block array of a GDALRasterBand (this allows the
GDALRasterBand to find its GDALRasterBlocks)
#3 - The data stored within the GDALRasterBlock (this is the actual data
stored in the block)

If you are limiting the scope of our support of threading to only allow 1
thread to access any 1 dataset at a time #3 requires no protection.
However, in any cache that is shared by more then 1 dataset (global cache),
not only must the linked list be protected by the list global cache, but
you must protect the cache block array in the raster bands. Since currently
when items are removed from the cache, it simply removes blocks until it is
below the cache limit, all raster bands can not read from their cache block
arrays until all flushing has completed.

Therefore, during my design I decided that each portion should be protected
by its own mutex. In order to not require the linked list (LRU) mutex from
being locked for extended periods, I mark blocks for deletion when it is to
removed from the cache and it will be removed at the earliest safe period,
which unless another the block is about to be used by another thread, will
be right away. Otherwise it will occur once the other thread is done using
that block.

Therefore, this also means the mutex protecting the global cache in my code
is also locked less. Therefore, even with out using a per dataset cache,
the example of using 5 threads to translate 5 datasets should still be
faster then the current configuration.

TLDR; There are benefits in my design outside of just a non global cache
for datasets.

Blake Thompson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140821/ab10e671/attachment-0001.html>