[GRASS-PSC] Introducing DOI for software, documentation and data in the GRASS project

Michael Barton Michael.Barton at asu.edu
Fri Nov 18 10:04:34 PST 2016

Markus and Co.

This is something CoMSES Net (Network for Computational Modeling in Social and Ecological Sciences: http://www.comses.net) has been working with for some years now. We maintain a software code library, where researchers can publish model code. We also provide for the option of code peer review, which can happen when code is submitted to the library for review along with a paper sent to a journal, or independent of any paper review. Code that has passed peer review is currently assigned a “handle” from handle.net<http://handle.net>. Handle.net<http://handle.net> is the organization that oversees the digital identifier ecosystem. DOI’s are commercial instances and handles are open source instances, but both are ultimately under the purview of handle.net<http://handle.net>. With a new grant from NSF, CoMSES Net is now part of a new national data infrastructure network in the US. One of our plans is to transition from handles to DOI’s because these are more widely recognized.

Given all this, we’ve had to think quite a bit about how to ‘publish’ model code and assign identifiers. As Vaclav points out there are significant issues with versioning. What happens with a new version? We’ve adopted a conceptual position that we are not a versioning repository primarily, but a place where authors can publish ‘finished’ code used in a research project or product. We are trying to treat this like a library and journal environment in that sense. We allow for minor revisions to correct errors (including as a response to reviews). But if a new product (e.g., a research paper) uses a new version of model code, we consider that a new digital object published, which could get a new handle/DOI distinct from a version of a model used for an earlier product. This remains something that is complicated to implement in practice. But the concept involves the reason for giving out the handle/DOI in the first place.

Currently, only about 10% of published model based science makes code available for review or reuse. We think it is increasingly important that researchers share the code that is an important component to scientific practice in the same way they share research protocols and results—and are increasingly encouraged to share data. But sharing code takes effort, and even researchers with the best intentions may find it difficult to find the time or energy to make code available. So we are trying to create incentives that will have some value in the academic/research world, including citable products. All models published in the CoMSES Net library have automatically generated citations. Those that have passed peer review, verifying some degree of software quality, are also given permanent identifiers (handles/DOIs), with the idea that researchers can put them on their CVs where they at least have the possibility of gaining them some recognition for the work carried out. That is, we consider a DOI as an incentive for sharing code and a bit of a lever to get others to cite that code if they use it.

We are still trying to work out how best to handle improvements (bug fixes) to a model vs. new models. We are moving our library to a Git environment, but are still working out how to implement our concept of “published” snapshots of code in a library/journal in versions and releases in Git. We do have a roadmap and are working on it, but we don’t yet have a solution in place.

Where is all this leading? We need to ask what is the value to assigning DOIs to GRASS code, how might they benefit GRASS developers, and how might they be used by GRASS software users? I don’t see that they provide the kind of incentives that CoMSES Net is envisioning for computational model developers. Most DOIs are assigned to finished products as digital objects. From that perspective, GRASS could get a DOI, but not its component modules. But what about each version of GRASS?  GRASS has formal releases, but not its components. Some code is in the released code base and other is in addons. There is ongoing development in the SVN. GRASS is a digital object of course, as are its component code modules, but it is a dynamic, living one and not a static one. Perhaps there are other benefits to working out the complications of where and when to assign DOIs in the GRASS ecosystem. But it will be good to start with a discussion of why and for whom we would do it.

(I’m copying Allen Lee from the CoMSES Net leadership team as he has thought a lot about this and might have other things to add.)


C. Michael Barton
Director, Center for Social Dynamics & Complexity
Professor of Anthropology, School of Human Evolution & Social Change
Head, Graduate Faculty in Complex Adaptive Systems Science
Arizona State University

voice:  480-965-6262 (SHESC), 480-965-8130/727-9746 (CSDC)
fax: 480-965-7671 (SHESC),  480-727-0709 (CSDC)
www: http://www.public.asu.edu/~cmbarton, http://csdc.asu.edu

On Nov 17, 2016, at 8:19 AM, Vaclav Petras <wenzeslaus at gmail.com<mailto:wenzeslaus at gmail.com>> wrote:

On Thu, Nov 17, 2016 at 6:04 AM, Markus Neteler <neteler at osgeo.org<mailto:neteler at osgeo.org>> wrote:
The question for me is: what do we need to do for that? Since a DOI
refers to a state in time, GRASS GIS module DOIs might be attached to
releases since they evolve over time. How would that practically work?

There are some services which can give you DOI for SW. From what I understood, DOI for a specific version in version control is often encourage, but that wouldn't work if you want to add the DOI back to the source or documentation because you just advanced the version in version control. Associating DOI with release makes more sense but than what to do with addons which (at least usually) don't have releases? Also some modules don't change between releases of GRASS GIS. Does this mean same DOI or different one?

Also, one module would accumulate different DOIs over time. Can we than accumulate number of citations for one module?

Another idea (for core modules) is assigning DOI to GRASS GIS release and then use DOI with hash (...doi.../123456#r.slope.aspect) similarly to how the TBI AV-Portal references an exact time in a video. I hope Peter has a better idea if this makes at least some sense or it is completely off.

Anyway, my current vision, partially related and partially unrelated, is to have a `Cite as` section in the manual unique for each module which would be generated from info stored in the parser information. Using parser is more flexible than putting it to the HTML directly and `Cite as` is more clear than including the info into `References` section. The info for `Cite as` can be DOI for the module or a paper associated with the module.
grass-psc mailing list
grass-psc at lists.osgeo.org<mailto:grass-psc at lists.osgeo.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-psc/attachments/20161118/64ed5917/attachment-0001.html>

More information about the grass-psc mailing list