[gdal-dev] Cannot open S3 files after upload

Even Rouault even.rouault at spatialys.com
Wed Jun 21 08:52:03 PDT 2017


On mercredi 21 juin 2017 11:47:49 CEST Matt Hanson wrote:
> Thanks Even,
> 
> Disabling reading the directory is another work around for my use case as
> well:
> (GDAL_DISABLE_READDIR_ON_OPEN=TRUE)

This can work, but will cause probing of lots of side-car files.

If you don't have any side-car files related to your .tif, try :
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR

> 
> On Wed, Jun 21, 2017 at 5:02 AM, Even Rouault <even.rouault at spatialys.com>
> 
> wrote:
> > Matt,
> > 
> > > My actual problem is a bit more specific then being unable to open S3
> > 
> > files
> > 
> > > after upload. The actual problem is that within the same Python session,
> > 
> > I
> > 
> > > can open a file off S3 with the vsis3 driver, but then if I upload a new
> > > 
> > > file that previously did not exist (using boto3), gdal does not see it
> > 
> > as a
> > 
> > > valid file.
> > 
> > Yes I'm aware of that issue. There's indeed metadata (file size & date,
> > directory listing) and data (chunks of files) cached by /vsicurl/ and
> > related file systems like /vsis3/ . /vsicurl/ was designed at a time where
> > web resources didn't change that much and it was unlikely during a same
> > GDAL session to see changes, but with cloud offerings, this is no longer
> > the case.
> > 
> > 
> > 
> > A few weeks ago I've added in trunk a CPL_VSIL_CURL_NON_CACHED config
> > option that can be set to disable caching on a file or set of files.
> > 
> > See https://trac.osgeo.org/gdal/wiki/ConfigOptions#CPL_VSIL_
> > CURL_NON_CACHED
> > 
> > 
> > 
> > So in your example, if you set
> > 
> > CPL_VSIL_CURL_NON_CACHED=/vsis3/put_here_the_bucket_name , that will work.
> > 
> > 
> > 
> > I've also just added per https://trac.osgeo.org/gdal/ticket/6937 a new
> > function VSICurlClearCache() function (bound to SWIG as
> > gdal.VSICurlClearCache()). So if you add gdal.VSICurlClearCache() just
> > after the s3.meta.client.upload_file() call, that will work too.
> > 
> > 
> > 
> > Both mechanisms are complementary.
> > 
> > 
> > 
> > CPL_VSIL_CURL_NON_CACHED is useful in scenarios where you don't know when
> > the server content can change (some other processes or machines do that
> > behind your back). Its advantage is that it doesn't require modification
> > of
> > code (it was designed for MapServer use case typically). The drawback of
> > it
> > is that you loose all caching when a same file is opened, close, opened,
> > closed, ... several times during the process.
> > 
> > 
> > 
> > VSICurlClearCache() will give you more control if you master when uploads
> > happen.
> > 
> > 
> > 
> > I've also backported VSICurlClearCache() to 2.2 branch.
> > 
> > 
> > 
> > As far as VSI_CACHE=TRUE is concerned, its scope of caching is restricted
> > to a same VSI file handle instance. Can be useful if the global 16 MB
> > vsicurl cache isn't big enough for very large files.
> > 
> > 
> > 
> > Even
> > 
> > 
> > 
> > --
> > 
> > Spatialys - Geospatial professional services
> > 
> > http://www.spatialys.com


-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170621/8007784a/attachment-0001.html>


More information about the gdal-dev mailing list