<div dir="ltr">Thanks, Even. I've flushed out rasterio's usage of FlushCache.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 3, 2019 at 11:04 AM Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On jeudi 3 octobre 2019 10:14:45 CEST Sean Gillies wrote:<br>
> In the comments above FlushCache() in gcore/gdaldataset.cpp it is said:<br>
> <br>
> * Using this method does not prevent use from calling GDALClose()<br>
> * to properly close a dataset and ensure that important data not addressed<br>
> * by FlushCache() is written in the file.<br>
<br>
> Does it vary by<br>
> format and driver?<br>
<br>
Of course, wouldn't be fun otherwise. For some formats, it might result in a <br>
completely consistent dataset, and in others, in something that can't be <br>
opened at all. So what is does, beyond evicting 'dirty' blocks from the cahce, <br>
is mostly an implementation detail.<br>
<br>
> What exactly is the important data that is not addressed?<br>
<br>
In the case of GeoTIFF, FlushCache() will for example ensure that all tile/<br>
strip data is flushed to disk, but the TileByteCount/TileOffset index arrays <br>
are not updated, and os a file that was just created, they will be at their <br>
zero default value, making the dataset appear to be empty to a reader that <br>
would try to open it at that point.<br>
<br>
If generating a large dataset, you can for example call FlushCache() at <br>
regular intervals to make sure that there is sufficiently space on the storage <br>
device (but the global block cache will also flush when it is saturated). This <br>
might be a way of avoiding the memory to reach the GDAL_CACHEMAX threshold. <br>
But this can also result in suboptimal behaviour if you call it at <br>
inappropriate point. For example if you write to a JPEG-compressed tiled TIFF, <br>
and your write pattern is row per row, then flushing before you reach a row <br>
number that is multiple of the tile height, will flush partially written <br>
blocks (their top will contain real data, and the bottom zeroes). So those <br>
blocks will be later decompressed and recompressed, causing unnecessary <br>
quality loss.<br>
<br>
FlushCache() is automatically called by dataset destructor, so my tip would <br>
be: "do not use FlushCache() unless you know you need it"<br>
<br>
Even<br>
<br>
-- <br>
Spatialys - Geospatial professional services<br>
<a href="http://www.spatialys.com" rel="noreferrer" target="_blank">http://www.spatialys.com</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Sean Gillies</div></div>