<div dir="ltr"><div><div><div><div><div>Peter,<br><br></div>I have changed the scripts to tile and build overviews as an external .ovr file as seen at:<br><br> <a href="https://s3-us-west-2.amazonaws.com/landsat-pds/L8/066/111/LC80661112015058LGN00/index.html">https://s3-us-west-2.amazonaws.com/landsat-pds/L8/066/111/LC80661112015058LGN00/index.html</a><br><br></div>I also changed the index generator to include file sizes, and had to fix a formatting bug in the index.html file after this was generated. <br><br></div>I omitted the x2 overview in the .ovr file because it was pretty larger - slightly more than 25% of the original file size. I also used averaging to build the overviews except for the BQA band. The zero "nodata" value isn't set on the files so I presume the nodata zeros are getting averaged in to the edge of real data in the pyramid. I'm not sure how worried we are about that. If we are, we could also set a nodata value on the files (zero) when they are originally tiled which should fix the overview generation. But I'm not sure, off hand, if there could (in theory) be valid zeros in the landsat data. <br><br></div>Best regards,<br></div>Frank<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 27, 2015 at 9:34 AM, Frank Warmerdam <span dir="ltr"><<a href="mailto:warmerdam@pobox.com" target="_blank">warmerdam@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Fri, Feb 27, 2015 at 9:28 AM, Peter Becker <span dir="ltr"><<a href="mailto:pbecker@esri.com" target="_blank">pbecker@esri.com</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Frank<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Please inform me once you have made the change and indicate clearly from which sceneID the newly downloaded images will be tiled.</span></p></div></div></blockquote><div><br></div><div>Peter,<br></div><div><br></div><div>I will do so. I didn't get to it yesterday as the permissions change took longer than I expected, but I'll try to address it today.<br> <br></div><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="blue" vlink="purple" lang="EN-US"><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I will look to have scripts created that convert the existing, but will not run until we are clear on the process and have tests reviewed.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I can review the list of Best pre 2015/1/1 scenes for downloading to check that they do exist in at USGS download. I would foresee that this is added in chunks
so as to ensure that there are not too many in any one time. You mention that the existing script runs every two hours. How many additional scenes could be added to the “tarq” so as not to overload?</span></p></div></div></blockquote><div><br></div></span><div>The step from tarq to unpacked is distributed over our worker system. I'm sure it could handle 10000 scenes without noticable impact on us. The part that is prone to congestion is the pulls from usgs. <br><br></div><div>I'd encourage you to drop a couple in, we can confirm they process properly, and then you can ramp up as much as you want. <br><br></div><div>Shall I prepare AWS keys for you to use in uploading the tarq directory? <br><br></div><div>Best regards,<br></div><div>Frank<br> <br></div><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="blue" vlink="purple" lang="EN-US"><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">_Peter<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> <a href="mailto:fwarmerdam@gmail.com" target="_blank">fwarmerdam@gmail.com</a> [mailto:<a href="mailto:fwarmerdam@gmail.com" target="_blank">fwarmerdam@gmail.com</a>]
<b>On Behalf Of </b>Frank Warmerdam<br>
<b>Sent:</b> Thursday, February 26, 2015 9:43 AM<br>
<b>To:</b> Peter Becker<br>
<b>Cc:</b> <a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a><br>
<b>Subject:</b> Re: [Landsat-pds] Tiling + additional scenes<u></u><u></u></span></p><div><div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Thu, Feb 26, 2015 at 9:30 AM, Peter Becker <<a href="mailto:pbecker@esri.com" target="_blank">pbecker@esri.com</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal">Frank<br>
Would you be able to add<br>
-co "TILED=YES" -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512"<br>
to line 19 of splitter.py, so as to enable the tiling.<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Peter,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I'm working on changes to the ingestor, so I'll try and implement this today. I am also inclined to try adding external overviews (as a .ovr file) despite them being a bit messy through /vsicurl/. They would work well when the S3 bucket
is mounted on an EC2 instance, and it wouldn't "pollute" the data file.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal">It would be good if a script can be run over the existing geotif files to change these also to tiled. What is the recommended approach to get this done?<u></u><u></u></p>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">That would be nice. Unfortunately since we don't keep around the original .tar files I think we would need to write a special script to restructure the files in place. I'd like to come up with a methodology to do this sort of rework pass
over the archive reasonably smoothly but nothing like that exists yet. In principle anyone who wanted to could take this on with the appropriate access keys.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal"><br>
On a similar note. I would like to suggest that we process some of the images prior to 2015/1/1. I can provide a list of the most suitable scene ID. I'm suggesting about 5 scenes for each PR (excluding north and south poles). Could a script be run that takes
this list as input and applies the same process on these.<u></u><u></u></p>
</blockquote>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Five scenes per path row is a substantial addition - I'll let Jed respond on how he feels about it. I'm certainly hoping to *eventually* convince Jed and Amazon to backfill the whole L8 archive. It doesn't go back that far. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">One tricky point with ingesting old scenes, as mentioned by Charlie or Amit on the last call, is that many of them aren't available for direct download from the USGS. We would need work out a methodology to request reprocessing and then
fetch them. Alternatively if we can drop the .tar files into the "tarq" directory from some other archive that would also do the trick, and avoid further loading the USGS link. Once the scene tarfile is in the "tarq" directory on S3, everything else should
work smoothly on the next queue run, roughly every two hours on the Planet Labs jobs system. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Best regards,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Frank<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_Peter<br>
<br>
-----Original Message-----<br>
From: Korver, Mark [mailto:<a href="mailto:mkorver@amazon.com" target="_blank">mkorver@amazon.com</a>]<br>
Sent: Thursday, February 26, 2015 8:24 AM<br>
To: Sundwall, Jed; Peter Becker<br>
Cc: <a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a><br>
Subject: RE: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4<br>
<br>
I am fine with tiling. I wrote earlier that in my less than scientific approach I typically use 512 tiles size which is the same as Peter's recommendation.<br>
We just need to make sure we do not alter any original pixel values and that gdalinfo reports the same precision etc -Mark<br>
<br>
<br>
-----Original Message-----<br>
From: Sundwall, Jed<br>
Sent: Wednesday, February 25, 2015 2:45 PM<br>
To: Peter Becker<br>
Cc: Korver, Mark; <a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a><br>
Subject: Re: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4<br>
<br>
I’m +1 for adding tiling to scenes, and I’d submit a pull request to add them, but I don’t know how!<br>
<br>
Does anyone else want to weigh in on this? I think Peter’s made a good case here, and tiling appears to open up enough possibilities that outweigh any potential downsides.<br>
<br>
Jed.<br>
<br>
> On Feb 10, 2015, at 7:31 PM, Peter Becker <<a href="mailto:pbecker@esri.com" target="_blank">pbecker@esri.com</a>> wrote:<br>
><br>
> Mark<br>
><br>
> I still maintain the tiling the imagery will have significant benefits for its use. One of the key aspects of the Landsat 8 processing is that each scene is orthorectified to specific UTM zones. Great effort was also put by USGS to ensure high pixel accuracy.
As a result it is relatively easy for applications to perform temporal analysis of the imagery. Much of such analysis is an Area of interest basis and not a scene basis. Having data non tiled, significantly increase the data read.<br>
> As a minor aside: The tiling also improves the compression a bit (about 2%). The current method of compressing each line of the scene separately means that Deflate is not quite as efficient as when compressing a tile of imagery. Also the number of offsets
in the TIF files is significantly reduced. Currently there is an offset and deflate header stored for every line (strip) IE about 7800 of them vs only about 250 of them if using tiled 512.<br>
><br>
> Frank did a great comparison on the compression<br>
> (<a href="https://github.com/landsat-pds/landsat_ingestor/blob/master/Compressi" target="_blank">https://github.com/landsat-pds/landsat_ingestor/blob/master/Compressi</a><br>
> <a href="http://on.md" target="_blank">on.md</a>) He shows the slight benefit of in compression for tiling.<br>
> I would like to expand on the one part in which he states:<br>
> =============="<br>
> Using either bigger strips or tiling has compression benefits, and<br>
> will be supported by clients, but is does have performance implications for different data processing flows. Simple viewers will often be able to show downsampled images quickly when the strip size is one line because they just pull the subset of lines they
need. Big strips or tiles means they have to essentially scan all the data. Tiling is great for subregion requests (perhaps from a mapping server) but less ideal for serial processing software that is going line by line. Overall, the organization options didn't
give that much difference in size, so we should likely consider the performance of utilization issues before finalizing a decision "===============<br>
> The down sampling being referred to here is to read every say X lines to provide such a down sampled image. Eg for 1/10 sale read every 10 lines. With strip based compression, this still requires reading 1/10 of the data and requires a lot of data requests
(approx 780). What I am recommending (in a separate thread) is that we include overviews (.OVR) . In this case the application could read the nearest appropriate overview and display that instead. In this specific example the data volume to read an image at
1/10 the resolution would be 1/81 (1/9x9) and be achieved in 4 range requests. That's a lot faster and simpler. Tiling TIF files is just as much of the TIF standard as strip wise compression.<br>
><br>
> Tiling the imagery will have very little if any detrimental effect on any application reading the whole file. If anything it will make it faster. In the vast majority of cases having the imagery tiled will significantly improve access performance, by allowing
only the required part of the scene to be read. I therefore recommend that this is changed to Tiled sooner than later.<br>
><br>
> Concerning the Latency question. I agree with you the 'Latency' is primarily for the first request, there after the requests are fast.<br>
> I see no reason why applications should not be doing 'range gets' to S3. GDAL already supports this using VSICurl.<br>
><br>
> _Peter<br>
><br>
> -----Original Message-----<br>
> From: <a href="mailto:landsat-pds-bounces@lists.osgeo.org" target="_blank">landsat-pds-bounces@lists.osgeo.org</a><br>
> [mailto:<a href="mailto:landsat-pds-bounces@lists.osgeo.org" target="_blank">landsat-pds-bounces@lists.osgeo.org</a>] On Behalf Of Korver, Mark<br>
> Sent: Friday, February 06, 2015 3:48 PM<br>
> To: <a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a><br>
> Subject: Re: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4<br>
><br>
> Normally when I am working with large TIFF files I will check to see their structure and if not tiled I will internally tile them to the same 512x512 that Peter mentions. That assumes clients are typically looking at parts of the TIFF file, which is true
if we are just trying to speed up a WMS server.<br>
> In that sense the 512 rings true to me, but having gone back and read Frank's words on the landsat ingestor process, I can see that the 512 is my typical use-case and maybe not someone else's best practice for sat scenes.<br>
> In fact, now that I think about it, I just saw a great demo yesterday? (seems like yesterday) where the whole point was interactive coding that was processing multiple L8 bands, and not some part of scene, but the whole scene, or scenes. Not sure about this
but that code is probably quite happy with no tiles.<br>
><br>
> Also, I would disagree on tiling for latency reasons. I don't think we will be doing range gets to S3 for this data. With what I saw yesterday and what I do with aerial imagery, the first request for the L8 band or aerial image might have some latency due
to having to get the file off of S3, but subsequent requests will be to a cached copy on SSD 'local' to the EC2 instance.<br>
> Also, latency is an interesting question with S3. From the individual compute node perspective the first request will be slow vs a local disk. But if you have a large cluster of nodes requesting data at the same time off of the same shared storage system,
that storage starts looking very low latency.<br>
> - Mark<br>
><br>
> ----------------------------------------------------------------------<br>
><br>
> Message: 1will<br>
> Date: Fri, 6 Feb 2015 00:41:25 +0000<br>
> From: Peter Becker <<a href="mailto:pbecker@esri.com" target="_blank">pbecker@esri.com</a>><br>
> To: "<a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a>" <<a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a>><br>
> Subject: [Landsat-pds] Tiling Scenes<br>
> Message-ID:<br>
> <<a href="mailto:A6B8217F0F87DD47AD6ACF0E268DB249B94CCEC6@RED-INF-EXMB-P1.esri.com" target="_blank">A6B8217F0F87DD47AD6ACF0E268DB249B94CCEC6@RED-INF-EXMB-P1.esri.com</a>><br>
> Content-Type: text/plain; charset="us-ascii"<br>
><br>
> I noticed that the scenes are currently compressed using Deflate and a predictor of 2 (horizontal difference).<br>
> The deflate brings down the file size, but does not allow sections of the imagery to be read without reading all previous pixels. This will significantly affect performance of any applications that only need to read parts of the image.<br>
> I would like to recommend that the following is added to the GDAL_translate command<br>
> -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 Tiling will improve access and is a standard part of TIF.<br>
> Using a tiling size of 512 is more optimum for storage that has higher latency.<br>
> In my tests I noticed a slight (1-2%) decrease in the resulting filesize as well.<br>
><br>
> _Peter<br>
><br>
> -------------- next part -------------- An HTML attachment was<br>
> scrubbed...<br>
> URL:<br>
> <<a href="http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/baa" target="_blank">http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/baa</a><br>
> 6bead/attachment-0001.html><br>
><br>
> ------------------------------<br>
><br>
> Message: 2<br>
> Date: Fri, 6 Feb 2015 00:41:25 +0000<br>
> From: Peter Becker <<a href="mailto:pbecker@esri.com" target="_blank">pbecker@esri.com</a>><br>
> To: "<a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a>" <<a href="mailto:landsat-pds@lists.osgeo.org" target="_blank">landsat-pds@lists.osgeo.org</a>><br>
> Subject: [Landsat-pds] Including Overviews<br>
> Message-ID:<br>
> <<a href="mailto:A6B8217F0F87DD47AD6ACF0E268DB249B94CCECC@RED-INF-EXMB-P1.esri.com" target="_blank">A6B8217F0F87DD47AD6ACF0E268DB249B94CCECC@RED-INF-EXMB-P1.esri.com</a>><br>
> Content-Type: text/plain; charset="us-ascii"<br>
><br>
> Currently the scenes do not include any overviews. This forces applications to read the imagery at full resolution even if accessing at a small scale.<br>
> It's common to include overviews/pyramids/reduced resolution datasets with such imagery.<br>
> I would like to recommend that such overviews are included with the<br>
> imagery<br>
><br>
> Typically this would increase the file size by approx 33%, which is quite considerable.<br>
> The size of the overviews can be reduced by skipping a level (eg use<br>
> level 4 8 16 32) vs 2 4 8 16 32, alternatively using a factor of 3 ie<br>
> 3 9 27 81 I tested this out on some images using Deflate with<br>
> predictor 2<br>
> 2 4 8 16 32 adds about 35%<br>
> 4 8 16 32 adds about 9%<br>
> 3 9 27 81 adds about 13%<br>
> My recommendation is to go with the 3 9 27 81 factors. This keeps the factor constant between all levels.<br>
> Average sampling would be used, except for the BQA band for which Nearest (else Majority) should be used.<br>
><br>
> Implementation would only require adding GDALADDO to the existing<br>
> script<br>
><br>
> We would also need to determine if we use internal or external overviews.<br>
> Internal overviews would increase the size of the files, but not change the directory structure.<br>
> Using external overviews (.tif.ovr) would add additional files to the directory, is more specific to GDAL.<br>
><br>
> As I expect many users to be using GDAL to access these files, I would recommend that we use the external overviews. I don't see the additional files as a burden.<br>
><br>
> One other potential concern (and I'm hoping Frank can answer this) is the tile size of the overviews. GDAL appear to always use a tile size (BLOCKSIZE) of 128. It would be more optimum if the file size could be set to 512.<br>
><br>
> _Peter<br>
><br>
> -------------- next part -------------- An HTML attachment was<br>
> scrubbed...<br>
> URL:<br>
> <<a href="http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/5d1" target="_blank">http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/5d1</a><br>
> 223ae/attachment-0001.html><br>
><br>
> ------------------------------<br>
><br>
> _______________________________________________<br>
> Landsat-pds mailing list<br>
> <a href="mailto:Landsat-pds@lists.osgeo.org" target="_blank">Landsat-pds@lists.osgeo.org</a><br>
> <a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds" target="_blank">
http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds</a><br>
><br>
><br>
> End of Landsat-pds Digest, Vol 2, Issue 4<br>
> *****************************************<br>
> _______________________________________________<br>
> Landsat-pds mailing list<br>
> <a href="mailto:Landsat-pds@lists.osgeo.org" target="_blank">Landsat-pds@lists.osgeo.org</a><br>
> <a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds" target="_blank">
http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds</a><br>
><br>
> _______________________________________________<br>
> Landsat-pds mailing list<br>
> <a href="mailto:Landsat-pds@lists.osgeo.org" target="_blank">Landsat-pds@lists.osgeo.org</a><br>
> <a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds" target="_blank">
http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds</a><u></u><u></u></p>
</blockquote>
</div>
<p class="MsoNormal"><br>
<br clear="all">
<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<p class="MsoNormal">-- <u></u><u></u></p>
<div>
<p class="MsoNormal">---------------------------------------+--------------------------------------<br>
I set the clouds in motion - turn up | Frank Warmerdam, <a href="mailto:warmerdam@pobox.com" target="_blank">
warmerdam@pobox.com</a><br>
light and sound - activate the windows | <a href="http://pobox.com/~warmerdam" target="_blank">
http://pobox.com/~warmerdam</a><br>
and watch the world go round - Rush | Geospatial Software Developer<u></u><u></u></p>
</div>
</div>
</div>
</div></div></div>
</div>
</blockquote></div></div></div><div><div class="h5"><br><br clear="all"><br>-- <br><div>---------------------------------------+--------------------------------------<br>I set the clouds in motion - turn up | Frank Warmerdam, <a href="mailto:warmerdam@pobox.com" target="_blank">warmerdam@pobox.com</a><br>light and sound - activate the windows | <a href="http://pobox.com/~warmerdam" target="_blank">http://pobox.com/~warmerdam</a><br>and watch the world go round - Rush | Geospatial Software Developer<br></div>
</div></div></div></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature">---------------------------------------+--------------------------------------<br>I set the clouds in motion - turn up | Frank Warmerdam, <a href="mailto:warmerdam@pobox.com" target="_blank">warmerdam@pobox.com</a><br>light and sound - activate the windows | <a href="http://pobox.com/~warmerdam" target="_blank">http://pobox.com/~warmerdam</a><br>and watch the world go round - Rush | Geospatial Software Developer<br></div>
</div>