[Live-demo] OSGeo-Live 7.0 - big data version?
Peter Baumann
p.baumann at jacobs-university.de
Wed Jul 10 16:33:45 PDT 2013
here my 2 cents:
On 07/11/2013 01:12 AM, Cameron Shorter wrote:
> I agree with Alex. For the moment, we still want our core OSGeo-Live
> distribution to be limited to 4G. 95% of OSGeo-Live use cases can be
> demonstrated without big data, and we can distribute orders of magniture more
> if we can minimise download size / cost to 4G.
>
> For the big data use cases, we can easily provide an alternative
> OSGeo-Live-Huge distribution (either as an accompanying download, or as
> OSGeo-Live + Big Data).
>
> So the outstanding question I'm hoping Peter and others can answer is:
> 1. Which dataset(s) to put into the accompanying data directory
up to each project
> 2. Where should we store the data directory?
extra file system (ext3/4 or so), mounted on /data on boot; each project gets a
subdirectory, such as /data/rasdaman/, for its own perusal + quota which can be
negotiated, including data sharing by symlink.
-Peter
> 3. Then update quickstarts to describe how to use this data directory.
>
> On 10/07/2013 2:57 AM, Peter Baumann wrote:
>>
>> well, for sure there is room for different opinions on this - so we both have
>> highlighted our reasons & positions.
>> -Peter
>>
>>
>>
>> On 07/09/2013 06:38 PM, Alex Mandel wrote:
>>> I don't let that space go empty, in my workshops it's used partly for
>>> persistence based installs, and for storing the results of the users at the
>>> end of a workshop. It also leaves room for them to import their own personal
>>> data-sets for use with OSGeo Live. To me that is extremely important.
>>>
>>> As for making the demo exciting, I agree in any given particular workshop
>>> you might want to demo more. However from the user perspective actually
>>> doing a more complicated analysis usually takes longer than the allotted
>>> time. So I would leave the big data for when the expert is demo-ing but have
>>> doubts that normal users who just downloaded will ever get around to looking
>>> at such complex data and if they do want to after the teaser, will be happy
>>> to hit a download more data button.
>>>
>>> Thanks,
>>> Alex
>>>
>>> On 07/09/2013 09:30 AM, Peter Baumann wrote:
>>>> I like jumbo :)
>>>>
>>>> As you say, Big Data is when download is not an option - so offering a
>>>> download is not fancy.
>>>> Why should we use more data? well, a time series diagram with 3 points
>>>> is not exciting - and we want excite, we want to have sexy demos. Isn't
>>>> it the whole purpose of OSGeo Live: to present OS stuff attractively?
>>>>
>>>> As we seem to approach a technical solution, let me ask conversely: if &
>>>> when we get it to work, why miss the opportunity and let 4 GB go empty?
>>>>
>>>> my 2 cents,
>>>> Peter
>>>>
>>>>
>>>> On 07/09/2013 06:18 PM, Alex Mandel wrote:
>>>>> Well, extra data can actually be placed directly on the the flash
>>>>> drive separate from the OS, the same way we put the Mac/Win
>>>>> installers. That data is accessible when booted, it's just read-only
>>>>> last time I tried unless you use sudo.
>>>>>
>>>>> I agree about better representation of types of data, but I don't feel
>>>>> the need, as I've explained previously, to enlarge the normal distro
>>>>> with lots of additional files. Big Data tools should be able to work
>>>>> on subsets. 3 point Time Series over a small are actually plenty
>>>>> useful in a workshop setting (eg. Current , 50 year, 100 year - Avg
>>>>> Temp in a single NetCDF). This demonstrates all the concepts required
>>>>> to know how to handle a 100 time segments later over the globe when an
>>>>> adequate computer is given the task.
>>>>>
>>>>> If you want to offer such a large dataset I suggest it's a separate
>>>>> download (or user initiated script) to pull the data to the extra
>>>>> space on their stick or an easily re-packed variant of our setup
>>>>> called osgeo-live-jumbo.iso
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>> On 07/09/2013 09:09 AM, Peter Baumann wrote:
>>>>>> responding inline:
>>>>>>
>>>>>> On 07/09/2013 06:05 PM, Alex Mandel wrote:
>>>>>>> It's not a partition limit, its a per file limit.
>>>>>>> Fat32 can create up to 2 TB partitions.
>>>>>>>
>>>>>>> Since the entirety of the live boot is actually run from 1 file it is
>>>>>>> subject to that limit. So, yes if you want to try to split the OS into
>>>>>>> multiple files for different mount points (one for /, /home, /var,
>>>>>>> etc) that is feasible as long as the first one contains the /boot.
>>>>>>>
>>>>>>> Or if you don't care if a user can access files on the stick from
>>>>>>> their native OS, you could just format the whole thing differently
>>>>>>> (not sure what bios booting needs).
>>>>>>
>>>>>> I don't see that this would be bleedingly necessary for OSGeo Live, so
>>>>>> we might simply switch to ext3/4 - Cameron?
>>>>>>
>>>>>>>
>>>>>>> But I'm still missing the incentive to increase the size of the distro.
>>>>>>
>>>>>> Big Data :)
>>>>>> ...which is not only 1 PB onwards (as XLDB defines it), but also V for
>>>>>> variety - and here we have 3D image timeseries, 4D climate data, and the
>>>>>> like. A 10 MB climate data set is no fun to play with.
>>>>>>
>>>>>> See http://standards.rasdaman.org for some concrete examples (which
>>>>>> still are not "big", BTW).
>>>>>>
>>>>>> cheers,
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Alex
>>>>>>>
>>>>>>> On 07/09/2013 08:59 AM, Peter Baumann wrote:
>>>>>>>> ah, I see - thanks for pointing us to that, Alex!
>>>>>>>> Looks difficult indeed for OSGeo Live.
>>>>>>>>
>>>>>>>> Another idea, maybe wildly impossible: what about allocating 2x 4 GB
>>>>>>>> partitions, and mount them after boot?
>>>>>>>>
>>>>>>>> -Peter
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/09/2013 05:55 AM, Alex Mandel wrote:
>>>>>>>>> You probably hit the file size limit of fat32
>>>>>>>>> https://en.wikipedia.org/wiki/File_Allocation_Table
>>>>>>>>> "The maximum possible size for a file on a FAT32 volume is 4 GiB"
>>>>>>>>>
>>>>>>>>> Sadly there is no universal successor to FAT32 that is easily
>>>>>>>>> accessible
>>>>>>>>> from all OS, so most external media is still fat32.
>>>>>>>>>
>>>>>>>>> NTFS is quite good but doesn't ship with Mac by default
>>>>>>>>> exFAT an attempt to bridge the gap requires work to be used on Linux
>>>>>>>>> or Mac
>>>>>>>>> EXT2/3/4 requires drivers to work on windows
>>>>>>>>>
>>>>>>>>> More info
>>>>>>>>> http://arstechnica.com/information-technology/2013/06/review-is-microsofts-new-data-sharing-system-a-cross-platform-savior/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>> On 07/08/2013 04:32 PM, Baumann, Peter wrote:
>>>>>>>>>> FWIW, we also just hit the wall when trying to get 7 GB on a stick,
>>>>>>>>>> somehow 4GB seems to be a hard limit. We have built a rasdaman demo
>>>>>>>>>> which is a subset of standards.rasdaman.org - what's loaded runs
>>>>>>>>>> in-situ, everything else will redirect to that site (prending
>>>>>>>>>> Internet connection).
>>>>>>>>>> -Peter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Dr. Peter Baumann- Professor of Computer Science, Jacobs University
>>>>>>>>>> Bremen
>>>>>>>>>> http://www.faculty.jacobs-university.de/pbaumann
>>>>>>>>>> mail: p.baumann at jacobs-university.detel: +49-421-200-3178, fax:
>>>>>>>>>> +49-421-200-493178
>>>>>>>>>> - Executive Director, rasdaman GmbH Bremen (HRB
>>>>>>>>>> 26793)http://www.rasdaman.com,
>>>>>>>>>> mail: baumann at rasdaman.com
>>>>>>>>>> tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882
>>>>>>>>>> "Si forte in alienas manus oberraverit hec peregrina epistola
>>>>>>>>>> incertis ventis dimissa, sed Deo commendata, precamur ut ei reddatur
>>>>>>>>>> cui soli destinata, nec preripiat quisquam non sibi parata." (mail
>>>>>>>>>> disclaimer, AD 1083)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: live-demo-bounces at lists.osgeo.org
>>>>>>>>>> [live-demo-bounces at lists.osgeo.org] on behalf of Cameron Shorter
>>>>>>>>>> [cameron.shorter at gmail.com]
>>>>>>>>>> Sent: Thursday, July 04, 2013 11:00 PM
>>>>>>>>>> To: live-demo at lists.osgeo.org
>>>>>>>>>> Subject: [Live-demo] OSGeo-Live 7.0 - big data version?
>>>>>>>>>>
>>>>>>>>>> On IRC right now we are discussing the possibility of creating a
>>>>>>>>>> "Big
>>>>>>>>>> Data" version of OSGeo-Live 7.0.
>>>>>>>>>>
>>>>>>>>>> This will likely be the standard OSGeo-Live (which will still
>>>>>>>>>> need to
>>>>>>>>>> work stand along), plus an extra data directory which could include
>>>>>>>>>> big
>>>>>>>>>> data, such as netCDF datasets. This could be distributed as a VM
>>>>>>>>>> or on
>>>>>>>>>> an 8Gig USB.
>>>>>>>>>>
>>>>>>>>>> I'm interested to hear thoughts on whether this will work for those
>>>>>>>>>> interested in showing big data on OSGeo-Live.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
--
Dr. Peter Baumann
- Professor of Computer Science, Jacobs University Bremen
www.faculty.jacobs-university.de/pbaumann
mail: p.baumann at jacobs-university.de
tel: +49-421-200-3178, fax: +49-421-200-493178
- Executive Director, rasdaman GmbH Bremen (HRB 26793)
www.rasdaman.com, mail: baumann at rasdaman.com
tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882
"Si forte in alienas manus oberraverit hec peregrina epistola incertis ventis dimissa, sed Deo commendata, precamur ut ei reddatur cui soli destinata, nec preripiat quisquam non sibi parata." (mail disclaimer, AD 1083)
More information about the Osgeolive
mailing list