<div dir="ltr">Report for 14.7. -- 2.8. 2008.<br><br>Warning: lengthy and somewhat depressing account of my doings follows.<br><br>First two weeks I spent in a scout camp. Kids were great, no work on GDAL though.<br><br>Then I returned back to the driver. I started experimenting with the simplified cache as mentioned in the previous report, hoping that I would discover the data mangling bug in the process. Which I did. It was due to an incorrect calculation of the gap on the top of tiles which caused the whole image to be shifted one block up. Fixed that. Reading now works without a hitch for me.<br>


<br>After that I spent some time on proper input validation and overall hardening. So far so good.<br><br>But for the whole time I have been pondering about the IO performance and how to do it best. The problem is this. <br>


<br>GDAL uniformly touches raster data in bandwise, top-down, left-right order. This is the exact opposite of what TMS driver needs. As I have written in the last post, TMS tiles doesn&#39;t map 1:1 to GDAL blocks. Two tiles, one above the other, contain data for one block. To be efficient, the driver must therefore cache the tiles. But in the top-down left-right order this means that two whole lines of tiles must be in cache at all times. For example, sixteen megabytes of memory is enough for sixty four png tiles 256x256 pixels. So the driver will efficiently operate only on requests that read at most 32x256 = 8192 pixels wide raster. <br>


<br>If the driver could work in the left-right top-down order, however, only the two tiles composing a single block would have to be cached.<br><br>Same goes for bands. The default implementation of Dataset::IRasterIO just calls RasterBand::IRasterIO on each band in turn. And while one line of tiles might usualy fit into cache, the whole raster most definitely wouldn&#39;t. This means that for png tiles, each file would be touched four times. <br>


<br>Therefore the optimal order is left-right, top-down, bandwise. This way, each tile file is read/written only once. How could this be achieved? The only place where all necessary information is present is the Dataset::IRasterIO method, because only it can know which bands were requested. So, my idea was to write Dataset::IRasterIO to use some other generic method DoIO based on the default IRasterIO but with the optimal order of access. RasterBand::IRasterIO would call DoIO on its dataset with appropriate arguments. DoIO would have to find the best overview when reading and write into all overviews of course.<br>


<br>So I started to dig into that. I finished the basic structure and decided to test it to see if all methods were called correctly.<br><br>NOOOOOOOOOOOOOOO!!!<br><br>They weren&#39;t of course. Mainly because gdal_translate (which I have been using for testing) doesn&#39;t even call Dataset::RasterIO. It loops through the raster bands and adds them one after each other to the output dataset. Exactly what it must not do in order to achieve optimality. And this is not an exception. The notion of raster band seems to be pilar, and the various parts of GDAL code use them frequently. <br>


<br>So, what do I do?<br><br>One theoreticaly possible way to optimise writing is to just store IO requests in some data structure and only in the FlushCache method optimise and actualy do them. But nothing like this is possible for reading since the caller expects the data to be transferred at the end of the call. I don&#39;t see a way out, the TMS driver will be slow and will thrash the hard drive.<br>


<br>To sum up what I have in my hands right now:<br><br>Reading TMS datasets works. The infrastructure for writing blocks is mostly in place (ten or so lines missing) but I don&#39;t have the code that creates new datasets in the filesystem yet. The cache works but should be improved a little. All this could be finished in a day or two. After that comes the rest of GDALDataset boilerplate: transformations, GCPs etc. These I haven&#39;t studied much yet.<br>


<br>Keo<br></div>