[Landsat-pds] Tiling + additional scenes
Frank Warmerdam
warmerdam at pobox.com
Mon Mar 2 00:23:49 PST 2015
Folks,
Garr, I meant to point to a scene that has been through the "reprocessing":
https://s3-us-west-2.amazonaws.com/landsat-pds/L8/165/031/LC81650312015032LGN00/index.html
it is also an example of all the latest settings, so Peter, please check it
out.
Best regards,
Frank
On Mon, Mar 2, 2015 at 12:12 AM, Frank Warmerdam <warmerdam at pobox.com>
wrote:
>
>
> On Sat, Feb 28, 2015 at 3:55 PM, Peter Becker <pbecker at esri.com> wrote:
>
>> Frank,
>>
>> Thanks for setting this up.
>>
>> I downloaded and reviewed the sample scene. All appears to work, but:
>>
>>
>>
>> I would recommend using factor of 3 (IE 3 9 27 81) in GDALaddo. The
>> size of the .OVRs are only about 13%, but it has the advantage that the
>> overview levels are evenly spaced and provides better performance at the
>> large scales. For the BQA image it also ensures there is no pixel shift.
>>
>
> Peter,
>
> Change applied. 3/9/27/81 was the overviews I traditionally used when I
> was at PCI though more recently I've tended to powers of two for no
> compelling reason.
>
>
>>
>>
>> The current .OVR file have a tile (block) size of 128. Would be better if
>> the tile size was 512, same as the high resolution tiles. This can be
>> achieved by including:
>>
>> GDAL_TIFF_OVR_BLOCKSIZE=512
>>
>> in the GDALaddo command.
>>
>
> Change applied.
>
>
>>
>>
>> Concerning the averaging of the NoData borders. I would not bother to try
>> and handle define as NoData. The original TIF files do not have 0 defined
>> as Nodata. The edge pixels of landsat8 include artifacts and should not be
>> used anyway. Also as the compression is lossless deflate, the one does not
>> get any extended artifact (eg in JPEG).
>>
>
> Great, I'll stick with the current approach for now.
>
> I have also added a new for_each_scene.py script and a reprocess_scene.py
> script. The reprocess_scene.py script attempts to bring one scene up to
> date with the current approaches. Currently this means rebuilding the
> index.html, ensuring it is tiled, and building overviews if they are
> missing. It can pull from S3 and push back when complete.
>
> The for_each_scene.py is intended to be an interator that runs others
> scripts like reprocess_scene.py on all scenes in the landsat-pds bucket,
> but that is already getting to be a substantial amount of data, so I think
> I'm going to have to come up with a more clever way to distribute
> reprocessing the scenes over a set of workers. That is pending. Anyone
> else who wants to step up and run the reprocessing is welcome to (in the
> interest of ensuring more than one person knows how to do stuff).
>
> Best regards,
> Frank
>
>
>>
>> Best Regards,
>>
>> _Peter
>>
>>
>>
>>
>>
>> *From:* fwarmerdam at gmail.com [mailto:fwarmerdam at gmail.com] *On Behalf Of
>> *Frank Warmerdam
>> *Sent:* Friday, February 27, 2015 11:11 PM
>> *To:* Peter Becker
>>
>> *Cc:* landsat-pds at lists.osgeo.org
>> *Subject:* Re: [Landsat-pds] Tiling + additional scenes
>>
>>
>>
>> Peter,
>>
>> I have changed the scripts to tile and build overviews as an external
>> .ovr file as seen at:
>>
>>
>> https://s3-us-west-2.amazonaws.com/landsat-pds/L8/066/111/LC80661112015058LGN00/index.html
>>
>> I also changed the index generator to include file sizes, and had to fix
>> a formatting bug in the index.html file after this was generated.
>>
>> I omitted the x2 overview in the .ovr file because it was pretty larger -
>> slightly more than 25% of the original file size. I also used averaging to
>> build the overviews except for the BQA band. The zero "nodata" value isn't
>> set on the files so I presume the nodata zeros are getting averaged in to
>> the edge of real data in the pyramid. I'm not sure how worried we are
>> about that. If we are, we could also set a nodata value on the files
>> (zero) when they are originally tiled which should fix the overview
>> generation. But I'm not sure, off hand, if there could (in theory) be
>> valid zeros in the landsat data.
>>
>> Best regards,
>>
>> Frank
>>
>>
>>
>> On Fri, Feb 27, 2015 at 9:34 AM, Frank Warmerdam <warmerdam at pobox.com>
>> wrote:
>>
>>
>>
>>
>>
>> On Fri, Feb 27, 2015 at 9:28 AM, Peter Becker <pbecker at esri.com> wrote:
>>
>> Frank
>>
>> Please inform me once you have made the change and indicate clearly from
>> which sceneID the newly downloaded images will be tiled.
>>
>>
>>
>> Peter,
>>
>>
>>
>> I will do so. I didn't get to it yesterday as the permissions change
>> took longer than I expected, but I'll try to address it today.
>>
>>
>> I will look to have scripts created that convert the existing, but will
>> not run until we are clear on the process and have tests reviewed.
>>
>> I can review the list of Best pre 2015/1/1 scenes for downloading to
>> check that they do exist in at USGS download. I would foresee that this is
>> added in chunks so as to ensure that there are not too many in any one
>> time. You mention that the existing script runs every two hours. How many
>> additional scenes could be added to the “tarq” so as not to overload?
>>
>>
>>
>> The step from tarq to unpacked is distributed over our worker system.
>> I'm sure it could handle 10000 scenes without noticable impact on us. The
>> part that is prone to congestion is the pulls from usgs.
>>
>> I'd encourage you to drop a couple in, we can confirm they process
>> properly, and then you can ramp up as much as you want.
>>
>> Shall I prepare AWS keys for you to use in uploading the tarq directory?
>>
>> Best regards,
>>
>> Frank
>>
>>
>> _Peter
>>
>>
>>
>>
>>
>> *From:* fwarmerdam at gmail.com [mailto:fwarmerdam at gmail.com] *On Behalf Of
>> *Frank Warmerdam
>> *Sent:* Thursday, February 26, 2015 9:43 AM
>> *To:* Peter Becker
>> *Cc:* landsat-pds at lists.osgeo.org
>> *Subject:* Re: [Landsat-pds] Tiling + additional scenes
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 26, 2015 at 9:30 AM, Peter Becker <pbecker at esri.com> wrote:
>>
>> Frank
>> Would you be able to add
>> -co "TILED=YES" -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512"
>> to line 19 of splitter.py, so as to enable the tiling.
>>
>>
>>
>> Peter,
>>
>>
>>
>> I'm working on changes to the ingestor, so I'll try and implement this
>> today. I am also inclined to try adding external overviews (as a .ovr
>> file) despite them being a bit messy through /vsicurl/. They would work
>> well when the S3 bucket is mounted on an EC2 instance, and it wouldn't
>> "pollute" the data file.
>>
>>
>>
>> It would be good if a script can be run over the existing geotif files to
>> change these also to tiled. What is the recommended approach to get this
>> done?
>>
>>
>>
>> That would be nice. Unfortunately since we don't keep around the
>> original .tar files I think we would need to write a special script to
>> restructure the files in place. I'd like to come up with a methodology to
>> do this sort of rework pass over the archive reasonably smoothly but
>> nothing like that exists yet. In principle anyone who wanted to could take
>> this on with the appropriate access keys.
>>
>>
>>
>>
>> On a similar note. I would like to suggest that we process some of the
>> images prior to 2015/1/1. I can provide a list of the most suitable scene
>> ID. I'm suggesting about 5 scenes for each PR (excluding north and south
>> poles). Could a script be run that takes this list as input and applies the
>> same process on these.
>>
>>
>>
>> Five scenes per path row is a substantial addition - I'll let Jed respond
>> on how he feels about it. I'm certainly hoping to *eventually* convince Jed
>> and Amazon to backfill the whole L8 archive. It doesn't go back that far.
>>
>>
>>
>> One tricky point with ingesting old scenes, as mentioned by Charlie or
>> Amit on the last call, is that many of them aren't available for direct
>> download from the USGS. We would need work out a methodology to request
>> reprocessing and then fetch them. Alternatively if we can drop the .tar
>> files into the "tarq" directory from some other archive that would also do
>> the trick, and avoid further loading the USGS link. Once the scene tarfile
>> is in the "tarq" directory on S3, everything else should work smoothly on
>> the next queue run, roughly every two hours on the Planet Labs jobs system.
>>
>>
>>
>> Best regards,
>>
>> Frank
>>
>>
>>
>>
>> _Peter
>>
>> -----Original Message-----
>> From: Korver, Mark [mailto:mkorver at amazon.com]
>> Sent: Thursday, February 26, 2015 8:24 AM
>> To: Sundwall, Jed; Peter Becker
>> Cc: landsat-pds at lists.osgeo.org
>> Subject: RE: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4
>>
>> I am fine with tiling. I wrote earlier that in my less than scientific
>> approach I typically use 512 tiles size which is the same as Peter's
>> recommendation.
>> We just need to make sure we do not alter any original pixel values and
>> that gdalinfo reports the same precision etc -Mark
>>
>>
>> -----Original Message-----
>> From: Sundwall, Jed
>> Sent: Wednesday, February 25, 2015 2:45 PM
>> To: Peter Becker
>> Cc: Korver, Mark; landsat-pds at lists.osgeo.org
>> Subject: Re: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4
>>
>> I’m +1 for adding tiling to scenes, and I’d submit a pull request to add
>> them, but I don’t know how!
>>
>> Does anyone else want to weigh in on this? I think Peter’s made a good
>> case here, and tiling appears to open up enough possibilities that outweigh
>> any potential downsides.
>>
>> Jed.
>>
>> > On Feb 10, 2015, at 7:31 PM, Peter Becker <pbecker at esri.com> wrote:
>> >
>> > Mark
>> >
>> > I still maintain the tiling the imagery will have significant benefits
>> for its use. One of the key aspects of the Landsat 8 processing is that
>> each scene is orthorectified to specific UTM zones. Great effort was also
>> put by USGS to ensure high pixel accuracy. As a result it is relatively
>> easy for applications to perform temporal analysis of the imagery. Much of
>> such analysis is an Area of interest basis and not a scene basis. Having
>> data non tiled, significantly increase the data read.
>> > As a minor aside: The tiling also improves the compression a bit (about
>> 2%). The current method of compressing each line of the scene separately
>> means that Deflate is not quite as efficient as when compressing a tile of
>> imagery. Also the number of offsets in the TIF files is significantly
>> reduced. Currently there is an offset and deflate header stored for every
>> line (strip) IE about 7800 of them vs only about 250 of them if using tiled
>> 512.
>> >
>> > Frank did a great comparison on the compression
>> > (https://github.com/landsat-pds/landsat_ingestor/blob/master/Compressi
>> > on.md) He shows the slight benefit of in compression for tiling.
>> > I would like to expand on the one part in which he states:
>> > =============="
>> > Using either bigger strips or tiling has compression benefits, and
>> > will be supported by clients, but is does have performance implications
>> for different data processing flows. Simple viewers will often be able to
>> show downsampled images quickly when the strip size is one line because
>> they just pull the subset of lines they need. Big strips or tiles means
>> they have to essentially scan all the data. Tiling is great for subregion
>> requests (perhaps from a mapping server) but less ideal for serial
>> processing software that is going line by line. Overall, the organization
>> options didn't give that much difference in size, so we should likely
>> consider the performance of utilization issues before finalizing a decision
>> "===============
>> > The down sampling being referred to here is to read every say X lines
>> to provide such a down sampled image. Eg for 1/10 sale read every 10 lines.
>> With strip based compression, this still requires reading 1/10 of the data
>> and requires a lot of data requests (approx 780). What I am recommending
>> (in a separate thread) is that we include overviews (.OVR) . In this case
>> the application could read the nearest appropriate overview and display
>> that instead. In this specific example the data volume to read an image at
>> 1/10 the resolution would be 1/81 (1/9x9) and be achieved in 4 range
>> requests. That's a lot faster and simpler. Tiling TIF files is just as much
>> of the TIF standard as strip wise compression.
>> >
>> > Tiling the imagery will have very little if any detrimental effect on
>> any application reading the whole file. If anything it will make it faster.
>> In the vast majority of cases having the imagery tiled will significantly
>> improve access performance, by allowing only the required part of the scene
>> to be read. I therefore recommend that this is changed to Tiled sooner than
>> later.
>> >
>> > Concerning the Latency question. I agree with you the 'Latency' is
>> primarily for the first request, there after the requests are fast.
>> > I see no reason why applications should not be doing 'range gets' to
>> S3. GDAL already supports this using VSICurl.
>> >
>> > _Peter
>> >
>> > -----Original Message-----
>> > From: landsat-pds-bounces at lists.osgeo.org
>> > [mailto:landsat-pds-bounces at lists.osgeo.org] On Behalf Of Korver, Mark
>> > Sent: Friday, February 06, 2015 3:48 PM
>> > To: landsat-pds at lists.osgeo.org
>> > Subject: Re: [Landsat-pds] Landsat-pds Digest, Vol 2, Issue 4
>> >
>> > Normally when I am working with large TIFF files I will check to see
>> their structure and if not tiled I will internally tile them to the same
>> 512x512 that Peter mentions. That assumes clients are typically looking at
>> parts of the TIFF file, which is true if we are just trying to speed up a
>> WMS server.
>> > In that sense the 512 rings true to me, but having gone back and read
>> Frank's words on the landsat ingestor process, I can see that the 512 is my
>> typical use-case and maybe not someone else's best practice for sat scenes.
>> > In fact, now that I think about it, I just saw a great demo yesterday?
>> (seems like yesterday) where the whole point was interactive coding that
>> was processing multiple L8 bands, and not some part of scene, but the whole
>> scene, or scenes. Not sure about this but that code is probably quite happy
>> with no tiles.
>> >
>> > Also, I would disagree on tiling for latency reasons. I don't think we
>> will be doing range gets to S3 for this data. With what I saw yesterday and
>> what I do with aerial imagery, the first request for the L8 band or aerial
>> image might have some latency due to having to get the file off of S3, but
>> subsequent requests will be to a cached copy on SSD 'local' to the EC2
>> instance.
>> > Also, latency is an interesting question with S3. From the individual
>> compute node perspective the first request will be slow vs a local disk.
>> But if you have a large cluster of nodes requesting data at the same time
>> off of the same shared storage system, that storage starts looking very low
>> latency.
>> > - Mark
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1will
>> > Date: Fri, 6 Feb 2015 00:41:25 +0000
>> > From: Peter Becker <pbecker at esri.com>
>> > To: "landsat-pds at lists.osgeo.org" <landsat-pds at lists.osgeo.org>
>> > Subject: [Landsat-pds] Tiling Scenes
>> > Message-ID:
>> > <
>> A6B8217F0F87DD47AD6ACF0E268DB249B94CCEC6 at RED-INF-EXMB-P1.esri.com>
>> > Content-Type: text/plain; charset="us-ascii"
>> >
>> > I noticed that the scenes are currently compressed using Deflate and a
>> predictor of 2 (horizontal difference).
>> > The deflate brings down the file size, but does not allow sections of
>> the imagery to be read without reading all previous pixels. This will
>> significantly affect performance of any applications that only need to read
>> parts of the image.
>> > I would like to recommend that the following is added to the
>> GDAL_translate command
>> > -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 Tiling will
>> improve access and is a standard part of TIF.
>> > Using a tiling size of 512 is more optimum for storage that has higher
>> latency.
>> > In my tests I noticed a slight (1-2%) decrease in the resulting
>> filesize as well.
>> >
>> > _Peter
>> >
>> > -------------- next part -------------- An HTML attachment was
>> > scrubbed...
>> > URL:
>> > <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/baa
>> > 6bead/attachment-0001.html>
>> >
>> > ------------------------------
>> >
>> > Message: 2
>> > Date: Fri, 6 Feb 2015 00:41:25 +0000
>> > From: Peter Becker <pbecker at esri.com>
>> > To: "landsat-pds at lists.osgeo.org" <landsat-pds at lists.osgeo.org>
>> > Subject: [Landsat-pds] Including Overviews
>> > Message-ID:
>> > <
>> A6B8217F0F87DD47AD6ACF0E268DB249B94CCECC at RED-INF-EXMB-P1.esri.com>
>> > Content-Type: text/plain; charset="us-ascii"
>> >
>> > Currently the scenes do not include any overviews. This forces
>> applications to read the imagery at full resolution even if accessing at a
>> small scale.
>> > It's common to include overviews/pyramids/reduced resolution datasets
>> with such imagery.
>> > I would like to recommend that such overviews are included with the
>> > imagery
>> >
>> > Typically this would increase the file size by approx 33%, which is
>> quite considerable.
>> > The size of the overviews can be reduced by skipping a level (eg use
>> > level 4 8 16 32) vs 2 4 8 16 32, alternatively using a factor of 3 ie
>> > 3 9 27 81 I tested this out on some images using Deflate with
>> > predictor 2
>> > 2 4 8 16 32 adds about 35%
>> > 4 8 16 32 adds about 9%
>> > 3 9 27 81 adds about 13%
>> > My recommendation is to go with the 3 9 27 81 factors. This keeps the
>> factor constant between all levels.
>> > Average sampling would be used, except for the BQA band for which
>> Nearest (else Majority) should be used.
>> >
>> > Implementation would only require adding GDALADDO to the existing
>> > script
>> >
>> > We would also need to determine if we use internal or external
>> overviews.
>> > Internal overviews would increase the size of the files, but not change
>> the directory structure.
>> > Using external overviews (.tif.ovr) would add additional files to the
>> directory, is more specific to GDAL.
>> >
>> > As I expect many users to be using GDAL to access these files, I would
>> recommend that we use the external overviews. I don't see the additional
>> files as a burden.
>> >
>> > One other potential concern (and I'm hoping Frank can answer this) is
>> the tile size of the overviews. GDAL appear to always use a tile size
>> (BLOCKSIZE) of 128. It would be more optimum if the file size could be set
>> to 512.
>> >
>> > _Peter
>> >
>> > -------------- next part -------------- An HTML attachment was
>> > scrubbed...
>> > URL:
>> > <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150206/5d1
>> > 223ae/attachment-0001.html>
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > Landsat-pds mailing list
>> > Landsat-pds at lists.osgeo.org
>> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
>> >
>> >
>> > End of Landsat-pds Digest, Vol 2, Issue 4
>> > *****************************************
>> > _______________________________________________
>> > Landsat-pds mailing list
>> > Landsat-pds at lists.osgeo.org
>> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
>> >
>> > _______________________________________________
>> > Landsat-pds mailing list
>> > Landsat-pds at lists.osgeo.org
>> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
>>
>>
>>
>>
>>
>> --
>>
>>
>> ---------------------------------------+--------------------------------------
>> I set the clouds in motion - turn up | Frank Warmerdam,
>> warmerdam at pobox.com
>> light and sound - activate the windows | http://pobox.com/~warmerdam
>> and watch the world go round - Rush | Geospatial Software Developer
>>
>>
>>
>>
>> --
>>
>>
>> ---------------------------------------+--------------------------------------
>> I set the clouds in motion - turn up | Frank Warmerdam,
>> warmerdam at pobox.com
>> light and sound - activate the windows | http://pobox.com/~warmerdam
>> and watch the world go round - Rush | Geospatial Software Developer
>>
>>
>>
>>
>> --
>>
>>
>> ---------------------------------------+--------------------------------------
>> I set the clouds in motion - turn up | Frank Warmerdam,
>> warmerdam at pobox.com
>> light and sound - activate the windows | http://pobox.com/~warmerdam
>> and watch the world go round - Rush | Geospatial Software Developer
>>
>
>
>
> --
>
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up | Frank Warmerdam,
> warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush | Geospatial Software Developer
>
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150302/de598e12/attachment-0001.html>
More information about the Landsat-pds
mailing list