[Landsat-pds] Upcoming updates to Landsat on AWS data
Amit Kapadia
amit at planet.com
Mon Oct 26 06:04:53 PDT 2015
Good news. We've finished ingesting the ~53,000 reprocessed scenes.
Jed - you can follow up with USGS to revoke the extra account.
Cheers,
Amit
On Fri, Oct 2, 2015 at 4:46 PM, Amit Kapadia <amit at planet.com> wrote:
> Jed - I can't give a definitive answer, but I suspect we'll start to fall
> behind. I just checked our ingestion from September, and we're doing well.
> All images released in September were uploaded to S3 by Oct 1. To keep this
> pace, we do have a machine running all the time. Our ingestion job has
> started to fail about 1/3 of the time due to the new rate limiting. It
> would be nice to understand the full scope of these constraints. Ideally
> we'd be able to talk to one of the developers to better understand how best
> to operate.
>
> On Wed, Sep 30, 2015 at 12:50 PM, Sundwall, Jed <jed at amazon.com> wrote:
>
>> Thanks for the update, Amit. Is it possible that this new limit could
>> cause us to fall behind in acquiring all new scenes as they’re produced
>> each day?
>>
>> On Sep 30, 2015, at 12:00 PM, Amit Kapadia <amit at planet.com> wrote:
>>
>> Hey Jed,
>>
>> Thanks for reaching out to them. Looks like we have another rate-limiting
>> error to handle:
>>
>> usgs.USGSError: RATE_LIMIT: Rate limit exceeded - cannot support
>> simultaneous requests.
>>
>> According to the changelog of the USGS inventory service:
>>
>> August 2015
>>
>> * Implemented single-stream rate limiting
>> * Added FGDC Metadata URL to search and metadata responses
>> * API Key is now required for all requests
>>
>> Despite the change being made in August, we're only now starting to see
>> this error. Previously, we were allowed 2 simultaneous downloads per
>> machine. This has been cut in half. To keep up with the flow of Landsat
>> scenes, we need simultaneous requests. This error is cropping up
>> periodically in our re-ingestion of the ~53,000 scenes, as well as our
>> daily ingestion.
>>
>> Enforcing single-stream per machine is a terrible waste of computing
>> resources.
>>
>> Also note, the need of an API key for all requests. Previously, anyone
>> was able to programmatically access metadata. This is no longer possible.
>>
>> Any help would be appreciated.
>>
>> Cheers,
>> Amit
>>
>> On Mon, Sep 28, 2015 at 3:38 PM, Sundwall, Jed <jed at amazon.com> wrote:
>>
>>> I’ve reached out to USGS to ask if we can increase the limit.
>>>
>>> Thanks for the update, Amit!
>>>
>>>
>>> On Sep 28, 2015, at 12:16 PM, Amit Kapadia <amit at planet.com> wrote:
>>>
>>> Another update on the reingestion of these ~53,000 scenes. We've moving
>>> along faster than the initial few weeks. Currently we have ~28,500 scenes
>>> left to reprocess. This is taking a bit of time, mostly because USGS rate
>>> limits the number of scenes that can be simultaneously downloaded.
>>>
>>> Jed - we often hit an error of this sort:
>>>
>>> DOWNLOAD_RATE_LIMIT: User currently has more than 10 downloads that have
>>> not been attempted in the past 10 minutes.
>>>
>>> If there's a way we can work with USGS on getting this type of
>>> rate-limiting lifted, I'll be able to spin up additional workers, breaking
>>> through this 10 scene limit. No big deal if that's not possible.
>>>
>>> Cheers,
>>> Amit
>>>
>>> On Tue, Sep 15, 2015 at 10:58 AM, Amit Kapadia <amit at planet.com> wrote:
>>>
>>>> We're ingesting about 1.35 scenes per minute (~2000 scenes per day).
>>>> With 44,200 scenes remaining, this work should be complete in 22 - 23 days.
>>>>
>>>> The additional worker has kicked up the rate. I'm learning more about
>>>> the rate-limiting that USGS imposes, and it seems that a single machine is
>>>> limited to 2 concurrent downloads (we already knew this). However, we have
>>>> 3 machines running, so the rate-limiting appears to be a combination
>>>> between IP address and EROS account.
>>>>
>>>> Cheers,
>>>> Amit
>>>>
>>>>
>>>> On Mon, Sep 14, 2015 at 3:58 PM, Sundwall, Jed <jed at amazon.com> wrote:
>>>>
>>>>> Thanks for the update, Amit. Could you please let us know if you see
>>>>> that the extra workers have upped our rate? Also, if you can estimate when
>>>>> this would be done?
>>>>>
>>>>> Thank you very much for your work on this!
>>>>>
>>>>> Jed.
>>>>>
>>>>> On Sep 14, 2015, at 2:16 PM, Amit Kapadia <amit at planet.com> wrote:
>>>>>
>>>>> Hi all - an update to the ingestion of these reprocessed Landsat
>>>>> scenes. Using the additional bandwidth that Jed locked down, we've ingested
>>>>> ~8,000 of the 52,877 scenes. This has been moving a little slow, so I've
>>>>> bumped up the number of workers.
>>>>>
>>>>> In the past we've been restricted to 2 concurrent downloads from USGS
>>>>> servers, but it now seems that we're able to get 4 concurrent downloads.
>>>>> I'll try our luck with one more worker (2 more downloads) to see if we're
>>>>> allowed this luxury.
>>>>>
>>>>> Ingestion of new Landsat scenes continues as normal.
>>>>>
>>>>> Cheers,
>>>>> Amit
>>>>>
>>>>>
>>>>> On Wed, Aug 26, 2015 at 12:08 PM, Sundwall, Jed <jed at amazon.com>
>>>>> wrote:
>>>>>
>>>>>> Quick update:
>>>>>>
>>>>>> We have been granted additional bandwidth to acquire Landsat data
>>>>>> from EROS and will use it to reacquire 53,206 scenes that have been
>>>>>> reprocessed with updated TIRS data as described at
>>>>>> http://landsat.usgs.gov/calibration_notices.php
>>>>>>
>>>>>> We will also use this opportunity to check for any scenes that we may
>>>>>> have failed to acquire throughout 2015. Another user of the data recently
>>>>>> pointed out that "USGS states that there are 149307 scenes so far in 2015,
>>>>>> but AWS claims to host only 145746 of them. As a percentage, that is 97.6%
>>>>>> - IOW 2.4% are missing.” These scenes may be missing from the bucket or
>>>>>> they may merely be missing from the scene_list.gz file.
>>>>>>
>>>>>> I’ll update the list once the reacquisition is complete.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jed Sundwall – Open Data – Amazon Web Services
>>>>>>
>>>>>> cell: 801-949-1482
>>>>>> office: 206-435-3104
>>>>>>
>>>>>> https://aws.amazon.com/opendata/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Landsat-pds mailing list
>>>>>> Landsat-pds at lists.osgeo.org
>>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20151026/468ac946/attachment.html>
More information about the Landsat-pds
mailing list