[GRASS-dev] what if: a new GRASS directory layout?
Ivan Shmakov
ivan at theory.asu.ru
Mon Apr 7 00:45:53 EDT 2008
>>>>> Glynn Clements <glynn at gclements.plus.com> writes:
[...]
>>> To have concurrent access working properly, we would need to add
>>> explicit locking to all modules. Actually, we might be able to do
>>> it in g.parser, at least for modules whose options contain complete
>>> information (i.e. not modules which derive the names of output maps
>>> from the names of the input maps).
>> So that, e. g., renaming would be impossible without all the
>> processing of the raster finished? It doesn't seem to be a
>> particulary bright solution. At least, it may bring deadlocks to
>> GRASS, and it seems to complicate parallel execution in general as
>> well.
> To make concurrent access work, updating a map would need to be an
> atomic operation, so that any module which reads the map sees either
> the "before" version or the "after" version, and never sees an
> "in-progress" version.
And that's simple. It's only the current layout of the GRASS
database directory that makes it hard. The new layout may be
like the following:
objects/<id> -- the data file;
rasters/<raster> -- the <raster>s ``inventory'' file.
The inventory file contains the <id>s of the objects comprising
a raster, and any other information deemed useful, e. g. (in RFC
822-like form):
Raster-Title: The Raster's Title Here
Raster-Data: BQCbkqHHIVoMq6OCG
Raster-Region: FpRZwnY7HgiaNPru0
Raster-Colormap: FVHD1acX1QQ4E6mgU
Raster-Category-Labels: 2RHL1nOsync1KQLbh
X-Raster-r-my-module-private-object: TzN7tvDGYsHeRIw9k
...
When the raster is created, the objects can be created in-place,
while the inventory is created in a temporary location and then
atomically moved to rasters/. When the raster is renamed, the
only thing that needs to be renamed is the inventory file. When
the raster is being overwritten, the new data is stored under
different object <id>s, and the inventory file is replaced
atomically upon the completion.
The benefits of this scheme are:
* the scheme allows for a better transaction-like semantics to
be implemented; in particular, the map could be replaced
atomically at any time;
* the scheme allows for the set of the objects to comprise a
raster to be extended easily; each module could have its own
``namespace'' within the inventory file;
* the `r.reclass' concept of /not/ generating any raster data
could be generalized with this scheme; e. g., it may be
allowed to use different color maps, or different regions,
over the same data easily; one may compare `r.reclass' with
creating symbolic links, while with the new scheme the ``hard
links'' become possible;
* the concept of ``trash can'' could be implemented with less
effort with this scheme;
* the objects/ and rasters/ directories could be easily
`rsync'ed; currently, the invocation of `rsync' implies a
transfer size penalty in presence of renamings;
* the scheme feels to be consistent with the ``rasterset'' ideas
[1].
[1] http://lists.osgeo.org/pipermail/grass-dev/2008-January/034772.html
The apparent drawbacks of it are:
* some reference count should be kept in order to delete the
objects that aren't needed for any longer; as on some (or?)
platforms it may be impossible to unlink () an opened object,
there should be a pure GRASS-library implementation of it;
(otherwise, it would be possible to just open all the files
comprising the raster to have it not to vanish meanwhile.)
* as there may be some dangling objects to remain, a dedicated
`g.fsck' module would be necessary;
* the change will be quite disturbing to the existing code base.
[...]
More information about the grass-dev
mailing list