[Landsat-pds] Status - Caught up with USGS!

Frank Warmerdam warmerdam at pobox.com
Thu Jan 29 17:22:28 PST 2015


On Thu, Jan 29, 2015 at 3:05 PM, Sundwall, Jed <jsundwal at amazon.com> wrote:

>  Celebration!
> <http://38.media.tumblr.com/84eb17389e8be791d0c3de2d606dc838/tumblr_mxx0yrW3Rl1qa5znqo1_400.gif>
>
>  We’ve updated the existing JavaScript S3 Explorer
> <http://landsat-pds.s3.amazonaws.com/index.html> to use the right
> endpoints to download files, but we need to resolve a few things before
> we’re happy with it.
>
>  1. It doesn’t update the URL in the browser as you navigate. I.e. If I
> click into row 100 and path 050 and try to copy and paste the URL from my
> browser, it will just send you back to where you started.
> 2. For the basic directory navigation, it’s way too crufty. We don’t need
> tables for that. We need lists!
> 3. I don’t want to use this approach for the individual scene index.html
> files. Scene index.html should be more or less like Frank’s and should be
> easy for search engines to index.
>
>  A few questions for Frank:
>
>  1. What’s expensive about the current approach? Is it the fact that all
> of the path/row index.html files need to be re-written every time a new
> scene is added?
>

Yes, it is writing  12000 index files in S3 which is pretty fast, but still
a couple minutes and a bunch of mostly zero value S3 churn.  I could
actually get smarter and only updates index files that will have changes if
we wanted to keep doing it this way.

2. What are the pain points for creating the scene specific index.html
> files? As far as search engine indexing goes, we wouldn’t need much info in
> plain text other than the scene name and some boiler plate language.
>

There isn't really any pain here though I look forward to suggestions, or
much better pull requests, to improve the formatting and useful information
in these files.



> 3. What do you think about creating a site map and updating it with URLs
> for individual scenes as they’re created? That way we don’t have to worry
> about crawlers not knowing how to navigate the JavaScript tree browser but
> can just get a list of every scene’s URL?
>

The urls are all in scene_list.gz so it would be wonderful if you could
scan that periodically and write back a site map.  I wasn't actually just
thinking about search engines when I talked about walking the tree.  It is
also not uncommon for me to write a crawler for subareas of geodata
download sites to find all the things I can download.

Best regards,
Frank


>
>  I’ll keep noodling on the JavaScript explorer.
>
>  Again, Celebration! Thanks to everyone who worked on this – but mostly
> Frank ;) We got a lot done in a very short amount of time.
>
>  Jed.
>
>
>   On Jan 29, 2015, at 2:35 PM, Frank Warmerdam <warmerdam at pobox.com>
> wrote:
>
>  Folks,
>
>  Good news, we are now caught up to the USGS feed, and modulo occasional
> wedges in the pipeline, and 503 related delays from USGS we should be
> up-to-date within a couple hours of USGS offering scenes.
>
>  There is an open ticket on semi-broken index.html files for scenes
> without the RGB bands I plan to work on soon, but I don't think that is too
> significant.
>
>  I mentioned the script that creates the index files at higher levels in
> the tree and this is running, but I'd like to drop that if Jed can offer a
> more dynamic (and less expensive to update) tree browser.  I will say, the
> benefit of the plain index.html files is that very ordinary web crawlers
> can walk them.
>
>  Best regards,
> --
>
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam,
> warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | Geospatial Software Developer
>   _______________________________________________
> Landsat-pds mailing list
> Landsat-pds at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/landsat-pds
>
>
>


-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Software Developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/landsat-pds/attachments/20150129/1af7e3b4/attachment.html>


More information about the Landsat-pds mailing list