<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
<style type="text/css" id="owaParaStyle"></style>
</head>
<body fpstyle="1" ocsi="0">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Blake, Even,
<div>I played a bit with pull request #39. I enabled per dataset cache by setting <span style="font-family: Verdana, Arial, 'Bitstream Vera Sans', Helvetica, sans-serif; font-size: 10pt; background-color: rgb(255, 255, 255);">GDAL_DATASET_CACHING to YES. Awesome
work!</span></div>
<div><span style="font-family: Verdana, Arial, 'Bitstream Vera Sans', Helvetica, sans-serif; font-size: 10pt; background-color: rgb(255, 255, 255);">Our application code opens file in GDAL and holds GDALDataset per thread. Different threads can request same
blocks but because cache should be per dataset i expected to see very little contention at the expense of some of the blocks being read multiple times. </span></div>
<div><br>
</div>
<div>To my surprise i saw that nothing changed in terms of time spent sleeping. After some investigation i realized that the contention point is static mutex <span style="font-size: 10pt;">hCOAMutex. Having noticed that i saw MUTEX_NONE ifdef there so i decided
to define it to effectively get rid of mutexes in GDAL. </span></div>
<div><span style="font-size: 10pt;">Guess what happened after recompilation. Because we only ever access GDALDataset from single dataset and the new GDALBlockManager is now per GDALDataset everything so far* (with the exception of PAMDataset destruction because
it does not like null mutexes) works fine and scales so much better with the number of threads.</span></div>
<div><span style="font-size: 10pt;"><br>
</span></div>
<div><span style="font-size: 10pt;">*Tested reading only using GTiff driver 4 threads reading tiles smaller than blocks, sometimes two threads can read data from the same block . </span></div>
<div><br>
</div>
<div>Do you see the same problem with hCOAMutex? </div>
<div>It would be good if i could say: give me per dataset cache, open this dataset with no locking. Unless the global cache provides its operations with less sleeping. </div>
<div><br>
</div>
<div>Regards.</div>
<div>Jacek Tomaka</div>
<div>
<div>
<div style="font-family: Times New Roman; color: #000000; font-size: 16px">
<hr tabindex="-1">
<div id="divRpF567235" style="direction: ltr;"><font face="Tahoma" size="2" color="#000000"><b>Od:</b> gdal-dev-bounces@lists.osgeo.org [gdal-dev-bounces@lists.osgeo.org] w imieniu Blake Thompson [flippmoke@gmail.com]<br>
<b>Wysłano:</b> 29 sierpnia 2014 02:35<br>
<b>Do:</b> Even Rouault<br>
<b>DW:</b> gdal-dev@lists.osgeo.org<br>
<b>Temat:</b> Re: [gdal-dev] RFC 47 and Threading<br>
</font><br>
</div>
<div></div>
<div>
<div dir="ltr"><span style="font-family:arial,sans-serif; font-size:13px">Even and Andre,</span>
<div style="font-family:arial,sans-serif; font-size:13px"><br>
<div class="gmail_extra">
<div class="gmail_quote">
<div class="im">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-color:rgb(204,204,204); border-left-style:solid; padding-left:1ex">
<div>> I want to start off by saying a big thanks to Blake for taking his time<br>
> to tackle what can only be a very difficult problem.<br>
> From what I can observe, the current discussion seems to be around the<br>
> boundary of who should be responsible for ensuring thread safety around<br>
> the block cache. The core of GDAL versus the individual drivers.<br>
<br>
</div>
The core will necessarily have to know about thread-safety because the block<br>
cache is there. The discussion is more whether the drivers must also necessary<br>
be thread aware, or if core mechanisms are sufficient to hide this detail to the<br>
drivers. And potentially offering to drivers a mechanism to deal themselves with<br>
thread-safety if they can have a more optimized implementation than the default<br>
one.</blockquote>
<div><br>
</div>
</div>
<div>Agreed, most specifically how to hide the detail of the protection of the pointer to the block cache's data that is passed through IReadBlock, IWriteBlock, and IRasterIO. </div>
<div class="im">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-color:rgb(204,204,204); border-left-style:solid; padding-left:1ex">
<div><br>
> While I<br>
> see why such a conversation is important, as far as I am concerned, the<br>
> most important part should be how it affects users of GDAL at the<br>
> interface level.<br>
<br>
</div>
Agreed.<br>
<div><br>
> That is, if an application that is threaded is trying<br>
> to use GDAL, how does it ensure thread-safety? What you have to keep in<br>
> mind is that having some parts of the library not thread-safe basically<br>
> just pushes the mutexing/locking to the calling applications.<br>
<br>
</div>
Not necessarily. That's what I suggested in my previous email : if the costs of<br>
the mutex are not too expensive for non-threaded usage, then the API could<br>
systematically return thread-safe versions, that are potentially wrapped by<br>
GDALDatasetThreadSafe<br>
</blockquote>
<div><br>
</div>
</div>
<div>So in the scope of my RFC, I am not certain how we could have a thread safe cache and a non-threadsafe cache in any simple manner. I know that you are specifically talking about the possiblity of thread safe datasets, which I feel is necessarily part of
this discussion but wanted to separate the two. If the thread-safe cache is too expensive I feel like that is a major issue however, and I am doing my best to avoid any performance hits for this change. </div>
<div class="im">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left-width:1px; border-left-color:rgb(204,204,204); border-left-style:solid; padding-left:1ex">
<div><br>
><br>
> Also, while it is important to document thread-safety limitations, might<br>
> I suggest adding thread-safe related capabilities (TestCapability),<br>
> especially if all drivers do not end-up having the same thread-safety<br>
> constraints.<br>
<br>
</div>
That might be a solution, although not ideal from a usability point of view (if<br>
we come with something more complex that non-thread-safe vs thread-safe, that<br>
might be difficult to understand by users of GDAL API), and from a<br>
GDAL-developer point of view as well (need to assess thread-safety for each<br>
driver).<br>
There can have subtelties : imagine that the VRT driver is made to be thread<br>
safe, but uses sources of drivers that might be not thread safe...<br>
<div><br>
><br>
> I personally do not see a GDALDatasetThreadSafe wrapper as adding much<br>
> complexity. For instance, if you were to add a capability that indicates<br>
> if a driver is inherently thread-safe, you could add a new open method<br>
> to open a dataset in a thread-safe way with something like the following<br>
> pseudocode:<br>
><br>
> DataSet OpenThreadSafe(GDALOpenInfo openInfo)<br>
> {<br>
> DataSet dataSet = Open(openInfo);<br>
><br>
> if (!dataSet.TestCapability(THREAD_SAFE))<br>
> {<br>
> dataSet = wrapWithThreadSafeWrapper(dataSet);<br>
> }<br>
><br>
> return dataSet;<br>
> }<br>
<br>
</div>
Yes, that's similar to what I imagined with my idea of GDAL_OF_THREADSAFE open<br>
flag.</blockquote>
<div><br>
</div>
</div>
<div>On the topic of the thread safe wrapper, I have spent more time thinking about it and I think this is probably the best solution to the problem of making all Datasets read safe, and I am willing to champion another RFC to implement this. However, the scope
of this is even larger because it should be required to work with the OGR datasets as well.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>Blake</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>