[gdal-dev] Cannot open S3 files after upload

Even Rouault even.rouault at spatialys.com
Wed Jun 21 02:02:25 PDT 2017


Matt,

> My actual problem is a bit more specific then being unable to open S3 files
> after upload. The actual problem is that within the same Python session, I
> can open a file off S3 with the vsis3 driver, but then if I upload a new
> file that previously did not exist (using boto3), gdal does not see it as a
> valid file. 

Yes I'm aware of that issue. There's indeed metadata (file size & date, directory listing) and 
data (chunks of files) cached by /vsicurl/ and related file systems like /vsis3/ . /vsicurl/ was 
designed at a time where web resources didn't change that much and it was unlikely during a 
same GDAL session to see changes, but with cloud offerings, this is no longer the case.

A few weeks ago I've added in trunk a CPL_VSIL_CURL_NON_CACHED config option that can 
be set to disable caching on a file or set of files.
See https://trac.osgeo.org/gdal/wiki/ConfigOptions#CPL_VSIL_CURL_NON_CACHED

So in your example, if you set
CPL_VSIL_CURL_NON_CACHED=/vsis3/put_here_the_bucket_name , that will work.

I've also just added per https://trac.osgeo.org/gdal/ticket/6937 a new function 
VSICurlClearCache() function (bound to SWIG as gdal.VSICurlClearCache()). So if you add 
gdal.VSICurlClearCache() just after the s3.meta.client.upload_file() call, that will work too.

Both mechanisms are complementary.

CPL_VSIL_CURL_NON_CACHED is useful in scenarios where you don't know when the server 
content can change (some other processes or machines do that behind your back). Its 
advantage is that it doesn't require modification of code (it was designed for MapServer use 
case typically). The drawback of it is that you loose all caching when a same file is opened, 
close, opened, closed, ... several times during the process.

VSICurlClearCache() will give you more control if you master when uploads happen.

I've also backported VSICurlClearCache() to 2.2 branch.

As far as VSI_CACHE=TRUE is concerned, its scope of caching is restricted to a same VSI file 
handle instance. Can be useful if the global 16 MB vsicurl cache isn't big enough for very 
large files.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170621/34958ac4/attachment-0001.html>


More information about the gdal-dev mailing list