[Landsat-pds] usgs service issues?

Frank Warmerdam warmerdam at pobox.com
Mon Jan 12 18:07:35 PST 2015


Amit,

OK, thanks for the feedback. I guess I will segment my ingestion into two
steps.  One that is run on a singleton VM to pull files from usgs, and push
them into a working location in S3, and the other that unpacks and repacks
and can be run massively parallel.  It is disappointing, but I suppose not
surprising that the USGS servers aren't very well setup to handle large
numbers of connections at the same time.

Best regards,
Frank


On Sat, Jan 10, 2015 at 11:02 AM, Amit Kapadia <amit at mapbox.com> wrote:

> Hi Frank,
>
> We pass multiple scene ids in each download request. This minimizes the
> number of requests made to USGS's servers, keeping us on friendly terms
> with them. We've found that one download request to get multiple download
> URLs, followed by parallel downloads is effective in avoiding 500s.
> Parallel downloads should be throttled; we happen to limit at 4 concurrent
> downloads.
>
> Despite having built redundancy for connections errors (e.g. 503s), we
> still miss up to 4% of the scenes each night.
>
> I'm wondering if it would be easier to use the code that we've already
> written to get this task done. It would be minimal effort to point the
> tarballs to the public bucket. We would have to add the additional
> functionality that you've built (e.g. splitting into individual scenes,
> preview images, index page).
>
> Cheers,
> Amit
>
>
> On Sat, Jan 10, 2015 at 11:07 AM, Frank Warmerdam <warmerdam at pobox.com>
> wrote:
>
>> Amit,
>>
>> I was only downloading one scene on that connection, but other processes
>> were potentially processing others at the same time.
>>
>> Currently I only ever pass one scene id to the request to get download
>> urls though it was using, not requesting the download url that failed.
>>
>> Best regards,
>> Frank
>>
>> On Sat, Jan 10, 2015 at 7:35 AM, Amit Kapadia <amit at mapbox.com> wrote:
>>
>>> Hi Frank,
>>>
>>> Documentation says each download request takes up to 50,000 scene ids.
>>> In practice the limit is lower, where the limiting factor is the size of
>>> the request. How many scenes where you requesting?
>>>
>>> Cheers,
>>> Amit
>>>
>>>
>>> On Sat, Jan 10, 2015 at 12:16 AM, Frank Warmerdam <warmerdam at pobox.com>
>>> wrote:
>>>
>>>> Amit,
>>>>
>>>> I ran into this once when using a download url for the usgs service:
>>>>
>>>> requests.exceptions.HTTPError: 503 Server Error: Service Temporarily
>>>> Unavailable
>>>>
>>>> Are there limits on the number of download urls I should be fetching at
>>>> once?  Is this a common problem?
>>>>
>>>> I haven't run enough at once to know if this is going to be a broader
>>>> problem, but if so it will mean I need to reconsider my approach which can
>>>> result in quite a few parallel downloads.
>>>>
>>>> Best regards,
>>>> --
>>>>
>>>> ---------------------------------------+--------------------------------------
>>>> I set the clouds in motion - turn up   | Frank Warmerdam,
>>>> warmerdam at pobox.com
>>>> light and sound - activate the windows | http://pobox.com/~warmerdam
>>>> and watch the world go round - Rush    | Geospatial Software Developer
>>>>
>>>
>>>
>>
>>
>> --
>>
>> ---------------------------------------+--------------------------------------
>> I set the clouds in motion - turn up   | Frank Warmerdam,
>> warmerdam at pobox.com
>> light and sound - activate the windows | http://pobox.com/~warmerdam
>> and watch the world go round - Rush    | Geospatial Software Developer
>>
>
>


-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150112/70f86b5b/attachment.html>


More information about the Landsat-pds mailing list