<div dir="ltr">Even, <br><br>Thank you very much for response to this, your helping me understand all of this stuff has been very valuable. <br><br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I'm not sure the ratio gain/effort to make dataset methods "a bit more" thread<br>
safe is high enough. Very few drivers can be made fully thread-safe, and fully<br>
parallelized (i.e. without a big mutex). Basically all file-based drivers or<br>
database-based drivers cannot. The former ones because the file handle cannot<br>
be used simultaneously by different threads. The latter ones for the same<br>
reason with the database connection handles. So basically that would just<br>
address the case of the MEM driver (which is an important use case for Blake).<br>
And in that instance, it would already be possible to use several MEM dataset<br>
handles that would point to the same memory buffer (and with a tiny per-dataset<br>
cache size since it is not really needed for a fast datasource such as MEM)<br></blockquote><div><br></div><div>I sadly agree with that still most of the drivers will not be thread-safe however, I don't see any way of getting away from this problem with out some big changes to the way the core library utilizes data provided by the driver (particularly the cache). This is partially why the changes are so vast. However, I want to state that once a dataset is loaded into the cache via the driver it can be effectively and quickly used in a multi-threaded way. This would be ideal for many situations where the cache is large enough, especially with the changes for a "per dataset block cache". Take for example where you are attempting to cut many small datasets from a single large shared dataset. If the block cache was not thrashing, the single large shared dataset could easily be utilized in multiple threads with great performance increases once all data in it was loaded into the cache. Perhaps I would be better off stating that this change does it best to make the cache very quick and safe in threading. </div>
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On the other side, I would be very pleased with having "just" the preliminary<br>
step of Blake's work, i.e. the possibility to choose a per-band block cache<br>
strategy instead of the global block cache. That should already address most<br>
scaling issues.<br></blockquote><div><br></div><div>First off wanted to state that it attempts to be a per dataset block cache, not a per band block cache, and only after setting a global configuration value. Its not a huge difference, but is something worth noting. </div>
<div><br></div><div>I know that you are concerned about there being three different mutexes, however, during testing of some different datasets, I was trying to put some value on the potential speed increases provided by multi-threading. Simply put larger scope mutexes provide much more locking and don't provide much improvements in performacne. The three mutexes were specifically placed to protect different sets of data and for the locking to take place only when these areas were at risk of corruption. </div>
<div><br></div></div></div></div>