[Live-demo] OSGeo-Live 7.0 - big data version?

Hamish hamish_b at yahoo.com
Fri Jul 5 03:14:58 PDT 2013


Cameron:
>> On IRC right now we are discussing the possibility of creating a
>> "Big Data" version of OSGeo-Live 7.0.

(+ we are not winning the war on keeping within disc space on the mini)

>> This will likely be the standard OSGeo-Live (which will still need
>> to work stand alon[e]), plus an extra data directory which could include
>> big data, such as netCDF datasets. This could be distributed as a VM
>> or on an 8Gig USB.

or with both :)

perhaps:  /usr/local/share/data/extra

but we face the same trouble as with the Mac & Windows Installers
detecting if the dir is there at boot time or not. (currently we
semi-solve that one with some tests in /etc/rc.local) 


Ian:
> +1
>
> We could potential trial some dual layer DVDs at FOSS4G

Hmm, an interesting idea ...  but given a choice considering the historic
premium cost for dual-layer dvds, unless it's very near the price of a
regular DVD (doubt it) I'd rather go for 8GB USB drives. They'd be
faster, reusable/reinstallable, and more and more laptops (especially the
light ones people like to travel long distances to conferences with)
don't have built-in DVD drives, but every laptop has a USB port or 3.

> (I assume 9 GB DVDs are bootable?)

Could be, but I don't know.

James:
> I think a distribution with more data (or a link where you could
> download it load it into the VM)

from the live dvd's desktop -> "Workshop Installation" -> 
 https://trac.osgeo.org/osgeo/wiki/Live_GIS_Workshop_Install

> where it is setup to work with the included applications would be
> useful.  I am not sure it makes sense to limit this to a 8GB USB/DVD.

the above infrastructure means we don't strictly have to, but the
3.2 GB mini, 3.8 GB FAT partition for live-bootable USBs, 4.2-4.3 GB
ISO DVD limitation, and 8 GB usb limitation are pretty hard limits.

 
> I would guess that working in a VM off of a real drive rather than a
> slow thumb drive or slower DVD would make more sense when working
> with larger data sets.

USB2 can be tolerable (~30GB/sec vs 50GB/sec for a slowish laptop
5400rpm drive). DVDs kinda painful to wait for these days..
note data i/o on a VM is often pretty slow.

> I am not sure I would call 4GB of extra data "Big Data".  To me,
> "Big Data" implies something bigger than fits easily in RAM on
> one node... For imagery, something closer 100 TB+ on the low end.

The sort of sample data we're trying to put together isn't "big" in
terms of bytes on a disk, what we're thinking about is hyperspectral
imagery (single point in time, small spatial coverage, circa 100 
imagery bands), and netCDF multi-dimensional datasets (limited
spatial coverage, dozens or hundreds of steps in the time series,
and a number of different variables).

WRT hyperspectral, some months ago I was talking with one of the
scientists behind HICO about getting some of the sensor data on
to the live disc; they have a test patch just north of the North
Carolina dataset which may be of interest. Another near Venice
looked like it might be interesting. As is usual, getting
permission to personally use the data and getting written permission
+ e.g. CC-By 3.0 redistribution terms from the full steering
committee needs some letters sent back and forth.
    http://hico.coas.oregonstate.edu

For netCDF we've got a small 65mb subset prepared:
   http://download.osgeo.org/livedvd/data/netcdf/

but there's a worry that it doesn't cover enough of a time series
to be a useful demo for software which shines with temporal data
models.

And finally, if there was the room, I'd really love to see some
LANDSAT-8 imagery get on the disc and into everyone's hands, since
Again, coverage over one of the existing datasets (e.g. the North
Carolina traditional GIS data or the Nottingham OpenStreetMap
extract) would be a good place to start. It's been online long
enough now that one of the 2-week passes should have found a clear
day. :)


final-final thought: It's a shame that geo-torrents dot org was lost
when ER Mapper was sold, I've always wanted to revive the idea.
(see osgeo trac ticket #458)


regards,
Hamish



More information about the Live-demo mailing list