[GRASS-dev] what if: a new GRASS directory layout?

Mon Apr 7 00:45:53 EDT 2008

>>>>> Glynn Clements <glynn at gclements.plus.com> writes:

[...]

 >>> To have concurrent access working properly, we would need to add
 >>> explicit locking to all modules. Actually, we might be able to do
 >>> it in g.parser, at least for modules whose options contain complete
 >>> information (i.e. not modules which derive the names of output maps
 >>> from the names of the input maps).

 >> So that, e. g., renaming would be impossible without all the
 >> processing of the raster finished?  It doesn't seem to be a
 >> particulary bright solution.  At least, it may bring deadlocks to
 >> GRASS, and it seems to complicate parallel execution in general as
 >> well.

 > To make concurrent access work, updating a map would need to be an
 > atomic operation, so that any module which reads the map sees either
 > the "before" version or the "after" version, and never sees an
 > "in-progress" version.

	And that's simple.  It's only the current layout of the GRASS
	database directory that makes it hard.  The new layout may be
	like the following:

	objects/<id> -- the data file;

	rasters/<raster> -- the <raster>s ``inventory'' file.

	The inventory file contains the <id>s of the objects comprising
	a raster, and any other information deemed useful, e. g. (in RFC
	822-like form):

Raster-Title: The Raster's Title Here
Raster-Data: BQCbkqHHIVoMq6OCG
Raster-Region: FpRZwnY7HgiaNPru0
Raster-Colormap: FVHD1acX1QQ4E6mgU
Raster-Category-Labels: 2RHL1nOsync1KQLbh
X-Raster-r-my-module-private-object: TzN7tvDGYsHeRIw9k
...

	When the raster is created, the objects can be created in-place,
	while the inventory is created in a temporary location and then
	atomically moved to rasters/.  When the raster is renamed, the
	only thing that needs to be renamed is the inventory file.  When
	the raster is being overwritten, the new data is stored under
	different object <id>s, and the inventory file is replaced
	atomically upon the completion.

	The benefits of this scheme are:

	* the scheme allows for a better transaction-like semantics to
          be implemented; in particular, the map could be replaced
          atomically at any time;

	* the scheme allows for the set of the objects to comprise a
	  raster to be extended easily; each module could have its own
	  ``namespace'' within the inventory file;

	* the `r.reclass' concept of /not/ generating any raster data
	  could be generalized with this scheme; e. g., it may be
	  allowed to use different color maps, or different regions,
	  over the same data easily; one may compare `r.reclass' with
	  creating symbolic links, while with the new scheme the ``hard
	  links'' become possible;

	* the concept of ``trash can'' could be implemented with less
	  effort with this scheme;

	* the objects/ and rasters/ directories could be easily
	  `rsync'ed; currently, the invocation of `rsync' implies a
	  transfer size penalty in presence of renamings;

	* the scheme feels to be consistent with the ``rasterset'' ideas
	  [1].

[1] http://lists.osgeo.org/pipermail/grass-dev/2008-January/034772.html

	The apparent drawbacks of it are:

	* some reference count should be kept in order to delete the
	  objects that aren't needed for any longer; as on some (or?)
	  platforms it may be impossible to unlink () an opened object,
	  there should be a pure GRASS-library implementation of it;
	  (otherwise, it would be possible to just open all the files
	  comprising the raster to have it not to vanish meanwhile.)

	* as there may be some dangling objects to remain, a dedicated
	  `g.fsck' module would be necessary;

	* the change will be quite disturbing to the existing code base.

[...]