[Live-demo] OSGeo-Live 7.0 - big data version?

Cameron Shorter cameron.shorter at gmail.com
Wed Jul 10 16:12:40 PDT 2013


I agree with Alex. For the moment, we still want our core OSGeo-Live 
distribution to be limited to 4G. 95% of OSGeo-Live use cases can be 
demonstrated without big data, and we can distribute orders of magniture 
more if we can minimise download size / cost to 4G.

For the big data use cases, we can easily provide an alternative 
OSGeo-Live-Huge distribution (either as an accompanying download, or as 
OSGeo-Live + Big Data).

So the outstanding question I'm hoping Peter and others can answer is:
1. Which dataset(s) to put into the accompanying data directory
2. Where should we store the data directory?
3. Then update quickstarts to describe how to use this data directory.

On 10/07/2013 2:57 AM, Peter Baumann wrote:
>
> well, for sure there is room for different opinions on this - so we 
> both have highlighted our reasons & positions.
> -Peter
>
>
>
> On 07/09/2013 06:38 PM, Alex Mandel wrote:
>> I don't let that space go empty, in my workshops it's used partly for 
>> persistence based installs, and for storing the results of the users 
>> at the end of a workshop. It also leaves room for them to import 
>> their own personal data-sets for use with OSGeo Live. To me that is 
>> extremely important.
>>
>> As for making the demo exciting, I agree in any given particular 
>> workshop you might want to demo more. However from the user 
>> perspective actually doing a more complicated analysis usually takes 
>> longer than the allotted time. So I would leave the big data for when 
>> the expert is demo-ing but have doubts that normal users who just 
>> downloaded will ever get around to looking at such complex data and 
>> if they do want to after the teaser, will be happy to hit a download 
>> more data button.
>>
>> Thanks,
>> Alex
>>
>> On 07/09/2013 09:30 AM, Peter Baumann wrote:
>>> I like jumbo :)
>>>
>>> As you say, Big Data is when download is not an option - so offering a
>>> download is not fancy.
>>> Why should we use more data? well, a time series diagram with 3 points
>>> is not exciting - and we want excite, we want to have sexy demos. Isn't
>>> it the whole purpose of OSGeo Live: to present OS stuff attractively?
>>>
>>> As we seem to approach a technical solution, let me ask conversely: 
>>> if &
>>> when we get it to work, why miss the opportunity and let 4 GB go empty?
>>>
>>> my 2 cents,
>>> Peter
>>>
>>>
>>> On 07/09/2013 06:18 PM, Alex Mandel wrote:
>>>> Well, extra data can actually be placed directly on the the flash
>>>> drive separate from the OS, the same way we put the Mac/Win
>>>> installers. That data is accessible when booted, it's just read-only
>>>> last time I tried unless you use sudo.
>>>>
>>>> I agree about better representation of types of data, but I don't feel
>>>> the need, as I've explained previously, to enlarge the normal distro
>>>> with lots of additional files. Big Data tools should be able to work
>>>> on subsets. 3 point Time Series over a small are actually plenty
>>>> useful in a workshop setting (eg. Current , 50 year, 100 year - Avg
>>>> Temp in a single NetCDF). This demonstrates all the concepts required
>>>> to know how to handle a 100 time segments later over the globe when an
>>>> adequate computer is given the task.
>>>>
>>>> If you want to offer such a large dataset I suggest it's a separate
>>>> download (or user initiated script) to pull the data to the extra
>>>> space on their stick or an easily re-packed variant of our setup
>>>> called osgeo-live-jumbo.iso
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>> On 07/09/2013 09:09 AM, Peter Baumann wrote:
>>>>> responding inline:
>>>>>
>>>>> On 07/09/2013 06:05 PM, Alex Mandel wrote:
>>>>>> It's not a partition limit, its a per file limit.
>>>>>> Fat32 can create up to 2 TB partitions.
>>>>>>
>>>>>> Since the entirety of the live boot is actually run from 1 file 
>>>>>> it is
>>>>>> subject to that limit. So, yes if you want to try to split the OS 
>>>>>> into
>>>>>> multiple files for different mount points (one for /, /home, /var,
>>>>>> etc) that is feasible as long as the first one contains the /boot.
>>>>>>
>>>>>> Or if you don't care if a user can access files on the stick from
>>>>>> their native OS, you could just format the whole thing differently
>>>>>> (not sure what bios booting needs).
>>>>>
>>>>> I don't see that this would be bleedingly necessary for OSGeo 
>>>>> Live, so
>>>>> we might simply switch to ext3/4 - Cameron?
>>>>>
>>>>>>
>>>>>> But I'm still missing the incentive to increase the size of the 
>>>>>> distro.
>>>>>
>>>>> Big Data :)
>>>>> ...which is not only 1 PB onwards (as XLDB defines it), but also V 
>>>>> for
>>>>> variety - and here we have 3D image timeseries, 4D climate data, 
>>>>> and the
>>>>> like. A 10 MB climate data set is no fun to play with.
>>>>>
>>>>> See http://standards.rasdaman.org for some concrete examples (which
>>>>> still are not "big", BTW).
>>>>>
>>>>> cheers,
>>>>> Peter
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Alex
>>>>>>
>>>>>> On 07/09/2013 08:59 AM, Peter Baumann wrote:
>>>>>>> ah, I see - thanks for pointing us to that, Alex!
>>>>>>> Looks difficult indeed for OSGeo Live.
>>>>>>>
>>>>>>> Another idea, maybe wildly impossible: what about allocating 2x 
>>>>>>> 4 GB
>>>>>>> partitions, and mount them after boot?
>>>>>>>
>>>>>>> -Peter
>>>>>>>
>>>>>>>
>>>>>>> On 07/09/2013 05:55 AM, Alex Mandel wrote:
>>>>>>>> You probably hit the file size limit of fat32
>>>>>>>> https://en.wikipedia.org/wiki/File_Allocation_Table
>>>>>>>> "The maximum possible size for a file on a FAT32 volume is 4 GiB"
>>>>>>>>
>>>>>>>> Sadly there is no universal successor to FAT32 that is easily
>>>>>>>> accessible
>>>>>>>> from all OS, so most external media is still fat32.
>>>>>>>>
>>>>>>>> NTFS is quite good but doesn't ship with Mac by default
>>>>>>>> exFAT an attempt to bridge the gap requires work to be used on 
>>>>>>>> Linux
>>>>>>>> or Mac
>>>>>>>> EXT2/3/4 requires drivers to work on windows
>>>>>>>>
>>>>>>>> More info
>>>>>>>> http://arstechnica.com/information-technology/2013/06/review-is-microsofts-new-data-sharing-system-a-cross-platform-savior/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> On 07/08/2013 04:32 PM, Baumann, Peter wrote:
>>>>>>>>> FWIW, we also just hit the wall when trying to get 7 GB on a 
>>>>>>>>> stick,
>>>>>>>>> somehow 4GB seems to be a hard limit. We  have built a 
>>>>>>>>> rasdaman demo
>>>>>>>>> which is a subset of standards.rasdaman.org - what's loaded runs
>>>>>>>>> in-situ, everything else will redirect to that site (prending
>>>>>>>>> Internet connection).
>>>>>>>>> -Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Dr. Peter Baumann- Professor of Computer Science, Jacobs 
>>>>>>>>> University
>>>>>>>>> Bremen
>>>>>>>>> http://www.faculty.jacobs-university.de/pbaumann
>>>>>>>>>    mail: p.baumann at jacobs-university.detel: +49-421-200-3178, 
>>>>>>>>> fax:
>>>>>>>>> +49-421-200-493178
>>>>>>>>> - Executive Director, rasdaman GmbH Bremen (HRB
>>>>>>>>> 26793)http://www.rasdaman.com,
>>>>>>>>>    mail: baumann at rasdaman.com
>>>>>>>>>    tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: 
>>>>>>>>> +49-173-5837882
>>>>>>>>> "Si forte in alienas manus oberraverit hec peregrina epistola
>>>>>>>>> incertis ventis dimissa, sed Deo commendata, precamur ut ei 
>>>>>>>>> reddatur
>>>>>>>>> cui soli destinata, nec preripiat quisquam non sibi parata." 
>>>>>>>>> (mail
>>>>>>>>> disclaimer, AD 1083)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________________
>>>>>>>>> From: live-demo-bounces at lists.osgeo.org
>>>>>>>>> [live-demo-bounces at lists.osgeo.org] on behalf of Cameron Shorter
>>>>>>>>> [cameron.shorter at gmail.com]
>>>>>>>>> Sent: Thursday, July 04, 2013 11:00 PM
>>>>>>>>> To: live-demo at lists.osgeo.org
>>>>>>>>> Subject: [Live-demo] OSGeo-Live 7.0 - big data version?
>>>>>>>>>
>>>>>>>>> On IRC right now we are discussing the possibility of creating a
>>>>>>>>> "Big
>>>>>>>>> Data" version of OSGeo-Live 7.0.
>>>>>>>>>
>>>>>>>>> This will likely be the standard OSGeo-Live (which will still
>>>>>>>>> need to
>>>>>>>>> work stand along), plus an extra data directory which could 
>>>>>>>>> include
>>>>>>>>> big
>>>>>>>>> data, such as netCDF datasets. This could be distributed as a VM
>>>>>>>>> or on
>>>>>>>>> an 8Gig USB.
>>>>>>>>>
>>>>>>>>> I'm interested to hear thoughts on whether this will work for 
>>>>>>>>> those
>>>>>>>>> interested in showing big data on OSGeo-Live.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Cameron Shorter
Software and Data Solutions Manager
Tel: +61 (0)2 8570 5050
Mob: +61 (0)419 142 254

Think Globally, Fix Locally
Geospatial & Data Solutions enhanced with Open Standards and Open Source
http://www.lisasoft.com



More information about the Live-demo mailing list