[Ica-osgeo-labs] Website feedback
Andy Anderson
aanderson at amherst.edu
Sat Aug 29 11:41:06 PDT 2015
I don’t agree with the statement “In scientific researches black boxes (commercial software) are not so useful”, and I doubt that most people would accept the principle that “One have to know all the details of the used algorithms (source code) to make the right conclusions”. They fly in the face of the reality of scientific research, which is built on the efforts of many people that we trust for for various reasons, in particular that they have become standard approaches in this or that subfield of science. Most people are trying to build on that without rebuilding the wheel.
“Black box” equipment is endemic in science exactly because they are useful. Many people here probably use GPS devices in their research. Anyone have one that they built themselves? Possibly a few, but how many actually built the GPS chip and wrote the signal processing software? We trust these devices, even though we know there can be issues. Part of being a scientist is watching for inconsistencies and working around them if possible. Most of the time this will be due to user error but occasionally it will require starting over with new equipment, perhaps even from a different company if you get a lot of lemons from them.
It’s also idealistic that one should know “the details of the used algorithms”. Most scientists will only know what an algorithm is used for, and perhaps the basic principle behind it, but most won’t know or care about its details. It is generally accepted that one can trust the implementation of software that has had a lot of eyes on it, commercial or open source, at least until it fails in an obvious way (and user error is always the first place to look).
Commercial software will generally tell you the algorithm they are using and sometimes even the package they are using, and it often comes down to open-source software anyway, e.g. “Many software are built on top of BLAS-compatible libraries, including Armadillo, LAPACK, LINPACK, GNU Octave, Mathematica, MATLAB, NumPy, and R.” ( https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms ). These packages have a long history and extensive testing (by NIST, no less — http://math.nist.gov/mcsd/Reports/95/yearly/node59.html ). Nevertheless there may still be errors that creep in, e.g. with new hardware. Their implementations, whether commercial or open-source, can also have errors. And there can also be errors in how they are used. Nevertheless they are generally trusted across the board.
Certainly every scientific paper should be describing the tools used, whether Matlab or R or custom software, and provide references to the algorithms applied, to give other scientists the opportunity to critique the results. Good reviewers will make sure this is true before a paper is published, and call out well-known issues (I would certainly look askance at the use of Excel for nonlinear data fitting — http://www.pages.drexel.edu/~bdm25/excel2007.pdf ). But the probability is very small that a reviewer will demand the work be performed with open-source software rather than commercial software or that the author explain standard algorithms.
The more general observation that I’ve heard in this area is that one’s calculations should be performed with multiple pieces of software (commercial or otherwise) to ensure consistency, a form of the repeatability required for scientific acceptance. In the absence of obvious errors or inconsistencies I doubt many people do that themselves, let alone go digging into open source code to review what it’s doing. And only computational scientists are likely to do the latter, not the research-focused scientist. There’s too much of a rush to get the results and publish it, so they’ll just look for another black box.
There is an opportunity here to focus on reproducibility, which is often overlooked in science (see, for example, the very recent news here: http://www.theguardian.com/science/2015/aug/27/study-delivers-bleak-verdict-on-validity-of-psychology-experiment-results ). While confirming other’s results is generally considered boring, there’s always the opportunity to contradict them, which is very exciting ( see, for example, this famous case: http://blogs.umass.edu/econnews/2013/09/24/media-buzz-over-reinhart-rogoff-critique-continues/ ).
So I would suggest the following statement instead:
Reason #5: In scientific research the reproducibility of results is essential, and that includes data produced by analytical software. Open-source software can provide a low-cost way to verify calculations made by commercial applications.
— Andy
On Aug 29, 2015, at 3:00 AM, Siki Zoltan <siki at agt.bme.hu> wrote:
> Dear Charles,
>
> I would add a 5th reason.
>
> Reason #5: In scientific researches black boxes (commercial software) are not so useful. One have to know all the details of the used algorithms (source code) to make the right conclusions, change an algoritm (source code) to get new experiences. it can be done only with open source software.
>
> Universities and Geo4All labs are research centers, too.
> I hope you understand my point, may be my text have to be edited.
>
> Regards,
> Zoltan
>
> On Fri, 28 Aug 2015, Charles Schweik wrote:
>
>> Hi Suchith, all
>>
>> Patrick Hogan made some helpful edits to the text I have on my lab's site
>> (thanks Patrick!), and I then edited it a little more toward some possible
>> useful language for the 'Why universities should join' text. I'm sure there
>> are other points to be made, but its a start. The text is attached.
>>
>> Suchith, not sure who is leading the update to the GeoForAll page on this
>> topic...
>>
>> Cheers
>> Charlie
>>
>> On Thu, Aug 27, 2015 at 9:29 AM, Jeff McKenna <jmckenna at gatewaygeomatics.com
>>> wrote:
>>
>>> On twitter just now a community leader made a comment that our website (
>>> http://www.geoforall.org) doesn't clearly point out the benefits for a
>>> university. We outline "How to Join" (
>>> http://www.geoforall.org/how_to_join/), but not really "Why to Join".
>>>
>>> I thought this was a good point, and now that we all understand it more,
>>> it might be good to highlight this on the site.
>>>
>>> -jeff
More information about the GeoForAll
mailing list