[gdal-dev] Write overviews directly to S3

Peter Schmitt pschmitt at gmail.com
Thu May 18 09:38:56 PDT 2017


Hi Jeremy,

We have come up with one technique to read/write directly to/from s3 using
/vsis3/ and a simple file writer class a colleague wrote:
https://gist.github.com/pedros007/55c6e33224596fb4d8e9e6b68b24ed9b  In
fact, I used this last week to add internal overviews to some of our images
in S3.  Here's a high-level overview of my Python implementation using
gdal-2.2.0:

# Open the image as a /vsimem/ file:
vsi_file = '/vsimem/image.tif'
ds = gdal.Translate('/vsis3/bucket/prefix.tif', vsi_file)

# Add overviews
err = ds.BuildOverviews('AVERAGE', [2,4,8,16,32,64])
if err != 0:
raise BuildOverviewsError('Failed to build overviews for %s' % vsi_file)
# Some overview levels wouldn't get written unless we flush & close the
data set.
ds.FlushCache()
ds = None
# even though the dataset is gone, the vsi_file still exists in memory.

# Write the data back to s3 using a class my colleague wrote.
try:
  vsimem_file = gdalutil.SimpleVSIMEMFile(vsi_file)
    s3 = boto3.resource('s3')
    obj = s3.Object(bucket_name='bucket', key='prefix.tif')
    # the Object#put API needs a file-like object:
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Object.put
    # the SimpleVSIMemFile implements a file-like object that uses the VSI
API to read a /vsimem/ file.
    obj.put(Body=vsimem_file)
finally:
    vsimem_file = None

# Remove the /vsimem/ file. (you probably want this in a try/finally block
to ensure it gets deleted)
gdal.Unlink(vsi_file)

The above process will be memory constrained.  Make sure your instance is
appropriately sized!  You probably want to run this on an EC2 instance in
the same region as the data sitting in s3.


Even - I found methods to check for VSI errors:
https://gist.github.com/pedros007/55c6e33224596fb4d8e9e6b68b24ed9b#file-simplevsimemfile-py-L73-L74

Are these intended for public consumption?  When reading from a
/vsimem/foo.shp with gdal-2.2.0, gdal.VSIGetLastErrorMsg() reported an
error "No such file or directory".   The VSI reader actually works
regardless if it's a /vsimem/ file or local path.  A colleague used a local
path and the error reported was a little more verbose:  It printed
something like "/mnt/data/foo.sbn: No such file or directory".  I did not
see that error gdal-2.1.3.  We don't need the sbn file (we're building a
qix file instead). I was surprised that the VSI system flagged this as an
error.  Seems like it should be more of a warning.  My motivation for using
adding error checking:  sometimes the command-line gdal_translate with
/vsis3/ paths would yield IFD read errors (I don't have the exact error
message handy).  Repeating the command would result in successful
translates.  I always assumed it was some transient packet loss/network
error.

Cheers,
Pete

On Thu, May 18, 2017 at 3:14 AM, Jeremy Palmer <JPalmer at linz.govt.nz> wrote:

> Hi Even,
>
> On 18/05/2017, at 9:12 PM, Even Rouault <even.rouault at spatialys.com>
> wrote:
> >
> > Is is possible to directly write external overview to a S3 bucket? With
> GDAL
> > 2.1.2 I get an error reporting that seek is not supported when writing to
> > vsis3:
>
>
> No, /vsis3/ only supports sequential writing in files (the original use
> case was to generate and upload a huge CSV file on the fly). I don't have
> all the details in mind but random writing might not be possible given the
> S3 API constraints, at least with the multipart upload API which is used
> currently.
>
>
> OK thanks for clarifying the situation.
>
>
> And another constraint of the current implementation is that a /vsis3/
> file is either read-only or write-only, but not a mix of both, which would
> be needed for gdaladdo internal overviews. Perhaps external overview would
> work, but I'm not completely sure as creating a TIFF file might require
> seeking.
>
>
> Perhaps a fully fledged read-write-update file system would be possible,
> but that wasn't in my initial design constraints.
>
>
> For now we will work around the issue.
>
> Thank for your help.
>
> Cheers,
> Jeremy
>
>
> ------------------------------
> This message contains information, which may be in confidence and may be
> subject to legal privilege. If you are not the intended recipient, you must
> not peruse, use, disseminate, distribute or copy this message. If you have
> received this message in error, please notify us immediately (Phone 0800
> 665 463 or info at linz.govt.nz) and destroy the original message. LINZ
> accepts no responsibility for changes to this email, or for any
> attachments, after its transmission from LINZ. Thank You.
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>



-- 
Pete
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170518/ef02a771/attachment.html>


More information about the gdal-dev mailing list