[Landsat-pds] Tiling + additional scenes

Frank Warmerdam warmerdam at pobox.com
Fri Feb 27 10:20:57 PST 2015


On Fri, Feb 27, 2015 at 10:17 AM, Peter Becker <pbecker at esri.com> wrote:

>  Frank
>
> Yes. Please prepare me AWS keys for uploading to the tarq directory.
>


Peter,

I will communicate them to you out-of-band.


>
>
> Also clarify what keys should be use for the conversion of the existing
> none tiled to tiled.
>

You can use the same key for all such purposes.


>  The script would download the existing TIF files, run GDAL_Translate
> with the same options you are using and then upload again.
>
> There are two options here:
>
> 1 – Directly overwrite the existing
>


>  2 – Temporarily create a similar directory structure and once QA
> applied, rename and overwrite the existing folders.
>
> Preferences? I’m inclined towards 2, but not sure if there are issues in
> overwriting the existing folders at a later stage.
>

My understanding is that S3 does not support object rename semantics in
which case we are pretty much left with replacement-in-place though that
has some modest risk factors if something goes wrong in the middle.

Best regards,
Frank


>
>
> _Peter
>
>
>
> *From:* fwarmerdam at gmail.com [mailto:fwarmerdam at gmail.com] *On Behalf Of *Frank
> Warmerdam
> *Sent:* Friday, February 27, 2015 9:34 AM
>
> *To:* Peter Becker
> *Cc:* landsat-pds at lists.osgeo.org
> *Subject:* Re: [Landsat-pds] Tiling + additional scenes
>
>
>
>
>
>
>
> On Fri, Feb 27, 2015 at 9:28 AM, Peter Becker <pbecker at esri.com> wrote:
>
> Frank
>
> Please inform me once you have made the change and indicate clearly from
> which sceneID the newly downloaded images will be tiled.
>
>
>
> Peter,
>
>
>
> I will do so.  I didn't get to it yesterday as the permissions change took
> longer than I expected, but I'll try to address it today.
>
>
>  I will look to have scripts created that convert the existing, but will
> not run until we are clear on the process and have tests reviewed.
>
> I can review the list of Best pre 2015/1/1 scenes for downloading to check
> that they do exist in at USGS download. I would foresee that this is added
> in chunks so as to ensure that there are not too many in any one time. You
> mention that the existing script runs every two hours. How many additional
> scenes could be added to the “tarq” so as not to overload?
>
>
>
> The step from tarq to unpacked is distributed over our worker system.  I'm
> sure it could handle 10000 scenes without noticable impact on us.  The part
> that is prone to congestion is the pulls from usgs.
>
> I'd encourage you to drop a couple in, we can confirm they process
> properly, and then you can ramp up as much as you want.
>
> Shall I prepare AWS keys for you to use in uploading the tarq directory?
>
> Best regards,
>
> Frank
>
>
>  _Peter
>
>
>
>
>
> *From:* fwarmerdam at gmail.com [mailto:fwarmerdam at gmail.com] *On Behalf Of *Frank
> Warmerdam
> *Sent:* Thursday, February 26, 2015 9:43 AM
> *To:* Peter Becker
> *Cc:* landsat-pds at lists.osgeo.org
> *Subject:* Re: [Landsat-pds] Tiling + additional scenes
>
>
>
>
>
>
>
> On Thu, Feb 26, 2015 at 9:30 AM, Peter Becker <pbecker at esri.com> wrote:
>
> Frank
> Would you be able to add
> -co "TILED=YES"  -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512"
> to line 19 of splitter.py, so as to enable the tiling.
>
>
>
> Peter,
>
>
>
> I'm working on changes to the ingestor, so I'll try and implement this
> today.  I am also inclined to try adding external overviews (as a .ovr
> file) despite them being a bit messy through /vsicurl/. They would work
> well when the S3 bucket is mounted on an EC2 instance, and it wouldn't
> "pollute" the data file.
>
>
>
> It would be good if a script can be run over the existing geotif files to
> change these also to tiled. What is the recommended approach to get this
> done?
>
>
>
> That would be nice.  Unfortunately since we don't keep around the original
> .tar files I think we would need to write a special script to restructure
> the files in place.  I'd like to come up with a methodology to do this sort
> of rework pass over the archive reasonably smoothly but nothing like that
> exists yet.  In principle anyone who wanted to could take this on with the
> appropriate access keys.
>
>
>
>
> On a similar note. I would like to suggest that we process some of the
> images prior to 2015/1/1. I can provide a list of the most suitable scene
> ID. I'm suggesting about 5 scenes for each PR (excluding north and south
> poles). Could a script be run that takes this list as input and applies the
> same process on these.
>
>
>
> Five scenes per path row is a substantial addition - I'll let Jed respond
> on how he feels about it. I'm certainly hoping to *eventually* convince Jed
> and Amazon to backfill the whole L8 archive.  It doesn't go back that far.
>
>
>
> One tricky point with ingesting old scenes, as mentioned by Charlie or
> Amit on the last call, is that many of them aren't available for direct
> download from the USGS.  We would need work out a methodology to request
> reprocessing and then fetch them.  Alternatively if we can drop the .tar
> files into the "tarq" directory from some other archive that would also do
> the trick, and avoid further loading the USGS link.  Once the scene tarfile
> is in the "tarq" directory on S3, everything else should work smoothly on
> the next queue run, roughly every two hours on the Planet Labs jobs system.
>
>
>
> Best regards,
>
> Frank
>
>
>
>
> _Peter
>
> -----Original Message-----
> From: Korver, Mark [mailto:mkorver at amazon.com]
> Sent: Thursday, February 26, 2015 8:24 AM
> To: Sundwall, Jed; Peter Becker
> Cc: landsat-pds at lists.osgeo.org
> Subject: RE: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4
>
> I am fine with tiling. I wrote earlier that in my less than scientific
> approach I typically use 512 tiles size which is the same as Peter's
> recommendation.
> We just need to make sure we do not alter any original pixel values and
> that gdalinfo reports the same precision etc -Mark
>
>
> -----Original Message-----
> From: Sundwall, Jed
> Sent: Wednesday, February 25, 2015 2:45 PM
> To: Peter Becker
> Cc: Korver, Mark; landsat-pds at lists.osgeo.org
> Subject: Re: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4
>
> I’m +1 for adding tiling to scenes, and I’d submit a pull request to add
> them, but I don’t know how!
>
> Does anyone else want to weigh in on this? I think Peter’s made a good
> case here, and tiling appears to open up enough possibilities that outweigh
> any potential downsides.
>
> Jed.
>
> > On Feb 10, 2015, at 7:31 PM, Peter Becker <pbecker at esri.com> wrote:
> >
> > Mark
> >
> > I still maintain the tiling the imagery will have significant benefits
> for its use. One of the key aspects of the Landsat 8 processing is that
> each scene is orthorectified to specific UTM zones. Great effort was also
> put by USGS to ensure high pixel accuracy. As a result it is relatively
> easy for applications to perform temporal analysis of the imagery. Much of
> such analysis is an Area of interest basis and not a scene basis. Having
> data non tiled, significantly increase the data read.
> > As a minor aside: The tiling also improves the compression a bit (about
> 2%). The current method of compressing each line of the scene separately
> means that Deflate is not quite as efficient as when compressing a tile of
> imagery. Also the number of offsets in the TIF files is significantly
> reduced. Currently there is an offset and deflate header stored for every
> line (strip) IE about 7800 of them vs only about 250 of them if using tiled
> 512.
> >
> > Frank did a great comparison on the compression
> > (https://github.com/landsat-pds/landsat_ingestor/blob/master/Compressi
> > on.md) He shows the slight benefit of in compression for tiling.
> > I would like to expand on the one part in which he states:
> > =============="
> > Using either bigger strips or tiling has compression benefits, and
> > will be supported by clients, but is does have performance implications
> for different data processing flows. Simple viewers will often be able to
> show downsampled images quickly when the strip size is one line because
> they just pull the subset of lines they need. Big strips or tiles means
> they have to essentially scan all the data. Tiling is great for subregion
> requests (perhaps from a mapping server) but less ideal for serial
> processing software that is going line by line. Overall, the organization
> options didn't give that much difference in size, so we should likely
> consider the performance of utilization issues before finalizing a decision
> "===============
> > The down sampling being referred to here is to read every say X lines to
> provide such a down sampled image. Eg for 1/10 sale read every 10 lines.
> With strip based compression, this still requires reading 1/10 of the data
> and requires a lot of data requests (approx 780). What I am recommending
> (in a separate thread) is that we include overviews (.OVR) . In this case
> the application could read the nearest appropriate overview and display
> that instead. In this specific example the data volume to read an image at
> 1/10 the resolution would be 1/81   (1/9x9) and be achieved in 4 range
> requests. That's a lot faster and simpler. Tiling TIF files is just as much
> of the TIF standard as strip wise compression.
> >
> > Tiling the imagery will have very little if any detrimental effect on
> any application reading the whole file. If anything it will make it faster.
> In the vast majority of cases having the imagery tiled will significantly
> improve access performance, by allowing only the required part of the scene
> to be read. I therefore recommend that this is changed to Tiled sooner than
> later.
> >
> > Concerning the Latency question. I agree with you the 'Latency' is
> primarily for the first request, there after the requests are fast.
> > I see no reason why applications should not be doing 'range gets' to S3.
> GDAL already supports this using VSICurl.
> >
> > _Peter
> >
> > -----Original Message-----
> > From: landsat-pds-bounces at lists.osgeo.org
> > [mailto:landsat-pds-bounces at lists.osgeo.org] On Behalf Of Korver, Mark
> > Sent: Friday, February 06, 2015 3:48 PM
> > To: landsat-pds at lists.osgeo.org
> > Subject: Re: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4
> >
> > Normally when I am working with large TIFF files I will check to see
> their structure and if not tiled I will internally tile them to the same
> 512x512 that Peter mentions. That assumes clients are typically looking at
> parts of the TIFF file, which is true if we are just trying to speed up a
> WMS server.
> > In that sense the 512 rings true to me, but having gone back and read
> Frank's words on the landsat ingestor process, I can see that the 512 is my
> typical use-case and maybe not someone else's best practice for sat scenes.
> > In fact, now that I think about it, I just saw a great demo yesterday?
> (seems like yesterday) where the whole point was interactive coding that
> was processing multiple L8 bands, and not some part of scene, but the whole
> scene, or scenes. Not sure about this but that code is probably quite happy
> with no tiles.
> >
> > Also, I would disagree on tiling for latency reasons. I don't think we
> will be doing range gets to S3 for this data. With what I saw yesterday and
> what I do with aerial imagery, the first request for the L8 band or aerial
> image might have some latency due to having to get the file off of S3, but
> subsequent requests will be to a cached copy on SSD 'local' to the EC2
> instance.
> > Also, latency is an interesting question with S3. From the individual
> compute node perspective the first request will be slow vs a local disk.
> But if you have a large cluster of nodes requesting data at  the same time
> off of the same shared storage system, that storage starts looking very low
> latency.
> > - Mark
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1will
> > Date: Fri, 6 Feb 2015 00:41:25 +0000
> > From: Peter Becker <pbecker at esri.com>
> > To: "landsat-pds at lists.osgeo.org" <landsat-pds at lists.osgeo.org>
> > Subject: [Landsat-pds] Tiling Scenes
> > Message-ID:
> >       <A6B8217F0F87DD47AD6ACF0E268DB249B94CCEC6 at RED-INF-EXMB-P1.esri.com
> >
> > Content-Type: text/plain; charset="us-ascii"
> >
> > I noticed that the scenes are currently compressed using Deflate and  a
> predictor of 2 (horizontal difference).
> > The deflate brings down the file size, but does not allow sections of
> the imagery to be read without reading all previous pixels. This will
> significantly affect performance of any applications that only need to read
> parts of the image.
> > I would like to recommend that the following is added to the
> GDAL_translate command
> >     -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 Tiling will
> improve access and is a standard part of TIF.
> > Using a tiling size of 512 is more optimum for storage that has higher
> latency.
> > In my tests I noticed a slight (1-2%) decrease in the resulting filesize
> as well.
> >
> > _Peter
> >
> > -------------- next part -------------- An HTML attachment was
> > scrubbed...
> > URL:
> > <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/baa
> > 6bead/attachment-0001.html>
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Fri, 6 Feb 2015 00:41:25 +0000
> > From: Peter Becker <pbecker at esri.com>
> > To: "landsat-pds at lists.osgeo.org" <landsat-pds at lists.osgeo.org>
> > Subject: [Landsat-pds] Including Overviews
> > Message-ID:
> >       <A6B8217F0F87DD47AD6ACF0E268DB249B94CCECC at RED-INF-EXMB-P1.esri.com
> >
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Currently the scenes do not include any overviews. This forces
> applications to read the imagery at full resolution even if accessing at a
> small scale.
> > It's common to include overviews/pyramids/reduced resolution datasets
> with such imagery.
> > I would like to recommend that such overviews are included with the
> > imagery
> >
> > Typically this would increase the file size by approx 33%, which is
> quite considerable.
> > The size of the overviews can be reduced by skipping a level (eg use
> > level  4 8 16 32) vs 2 4 8 16 32, alternatively using a factor of 3 ie
> > 3 9 27 81 I tested this out on some images using Deflate with
> > predictor 2
> > 2 4 8 16 32  adds about 35%
> > 4 8 16 32  adds about 9%
> > 3 9 27 81 adds about 13%
> > My recommendation is to go with the 3 9 27 81 factors. This keeps the
> factor constant between all levels.
> > Average sampling would be used, except for the BQA band for which
> Nearest (else Majority) should be used.
> >
> > Implementation would only require adding GDALADDO to the existing
> > script
> >
> > We would also need to determine if we use internal or external overviews.
> > Internal overviews would increase the size of the files, but not change
> the directory structure.
> > Using external overviews (.tif.ovr) would add additional files to the
> directory, is more specific to GDAL.
> >
> > As I expect many users to be using GDAL to access these files, I would
> recommend that we use the external overviews. I don't see the additional
> files as a burden.
> >
> > One other potential concern (and I'm hoping Frank can answer this) is
> the tile size of the overviews. GDAL appear to always use a tile size
> (BLOCKSIZE) of 128. It would be more optimum if the file size could be set
> to 512.
> >
> > _Peter
> >
> > -------------- next part -------------- An HTML attachment was
> > scrubbed...
> > URL:
> > <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/5d1
> > 223ae/attachment-0001.html>
> >
> > ------------------------------
> >
> > _______________________________________________
> > Landsat-pds mailing list
> > Landsat-pds at lists.osgeo.org
> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
> >
> >
> > End of Landsat-pds Digest, Vol 2, Issue 4
> > *****************************************
> > _______________________________________________
> > Landsat-pds mailing list
> > Landsat-pds at lists.osgeo.org
> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
> >
> > _______________________________________________
> > Landsat-pds mailing list
> > Landsat-pds at lists.osgeo.org
> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
>
>
>
>
>
> --
>
>
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam,
> warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | Geospatial Software Developer
>
>
>
>
> --
>
>
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam,
> warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | Geospatial Software Developer
>



-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150227/6d80eb81/attachment-0001.html>


More information about the Landsat-pds mailing list