[gdal-dev] /vsicurl caching behavior

Even Rouault even.rouault at spatialys.com
Fri Jan 5 03:21:12 PST 2018


Hi,

> 1) There are a lot of knobs that can be used to tune the thing that are not
> documented. For example CPL_VSIL_CURL_USE_CACHE. Is it on purpose?

Yes, the disk cache is an experiment that isn't used anywhere (from what I know) and likely 
not in a finished state as you noticed in your below points which are all valid and should be 
addressed if someone wanted to make it production ready.

> 
> 6) If VSI_CACHE is enabled the data is cached twice in memory (papsRegions
> and VSICachedFile). Is it wanted?

The scope of the caches are not the same. papsRegions is a global cache shared by all /
vsicurl/ handles, and persistant (in memory) on their closing (so that if the same filename is 
closed and re-opened in sequence, already read parts can be reused), whereas VSICachedFile 
is associated with a single file handle.
I guess there could be some optimizations to avoid those duplications, but that could 
complicate substantially the code which is already non trivial.

> 
> 7) If the file's content is modified, it's the total mess. We'll end up
> having portions of the file having the old data while the rest has the new
> data. I'm quite sure the GeoTiff we end up with won't be very valid.

Indeed. But the mess would also happen with no caching mechanism if a file is modified while 
being read. Even for a local file, GDAL using glibc FILE buffering API, so even if you modify 
some portions of a GeoTIFF that haven't been read yet, but you already read closing regions, 
there's a chance, you'll read old data in part.

> 
> 8) In the case discussed in 7), CPL_VSIL_CURL_NON_CACHED will just purge
> the data from 1 the 3 caches: papsRegions. The vsil_cache and the disk will
> still cache the content.

CPL_VSIL_CURL_NON_CACHED avoids the content of a file to be preserved in the 
papsRegion cache when a file handle is closed and re-opened. And VSICachedFile is only valid 
during the lifetime of the file handle. So I don't think there's an issue there. Perhaps the 
naming CPL_VSIL_CURL_NON_CACHED is a bit misleading: there's always some cache 
(otherwise /vsicurl performance would be just too horrible), it is just that it doesn't survive 
file closing.

Even


-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20180105/712d3da0/attachment.html>


More information about the gdal-dev mailing list