[GRASS-dev] Re: what if: a new GRASS directory layout?

Glynn Clements glynn at gclements.plus.com
Tue Apr 8 17:39:00 EDT 2008


Ivan Shmakov wrote:

>  > I initially thought that you would need some form of locking, to
>  > prevent the map from being replaced in the middle of the sequence.
>  > However, you could achieve the same result by caching the inventory
>  > within the module, but the code which garbage-collects unreferenced
>  > elements would need to allow for this.
> 
> 	Yes.  The need for GC seems to be the weekest point of this
> 	scheme.
> 
> 	On Unix, as a first approximation, I'd just open () every binary
> 	object that's referenced by the inventory being processed.  This
> 	way, even if the file loses its name, it would be available to
> 	the program.

However: while updates to the cell/fcell file are atomic (new data is
written to a temporary file which is rename()d upon closing), support
files are typically updated by simply opening the output file for
write. If the output file already exists, it is overwritten in place.

Obviously, that would need to change in order for "transactional" I/O
to work. The fact that individual modules can create their own private
files in cell_misc complicates matters.

BTW, the definition of "support file" includes the cellhd file, which
is rather fundamental, as the layout of the cell/fcell file (rows,
columns, format, compression) is stored in the cellhd file. If the
contents of the cellhd file don't match the cell/fcell file, libgis
will probably just crash.

> 	In general, every binary object would need a list of references.
> 	Maintaining a list of names of referencing rasters shouldn't be
> 	too hard to implement.  On the contrary, a list of PIDs (to
> 	allow for a raster to be referenced by a process) looks a bit
> 	fragile.

In-process references could be maintained by making a copy (or hard
link) to the inventory, so that the GC treats it as "live". You would
need some kind of clean-up mechanism to handle any copies which are
left behind if a module crashes.

>  > Even there, you could run into problems where a module invokes
>  > another module; the child would need to use the same version of the
>  > map as the parent.
> 
> 	Agreed.
> 
> 	Actually, the only proper solution to this problem that I know
> 	is moving the whole computation chain into a ``parallel
> 	existence'' -- forking a separate copy-on-write location at the
> 	beginning of a ``transaction block'', and merging it back when
> 	it's done.  And while I hope that something like this will
> 	eventually be available in GRASS, I probably wouldn't say that
> 	the current code base is anywhere near that.

Yep. For the time being, I'd settle for simply re-arranging the
database layout to have one directory per map.

[BTW, it has been pointed out that this can reduce the maximum number
of maps per mapset, as the limit on an inode's hard link count limits
the maximum number of subdirectories, while there is usually no fixed
limit on the number of files. E.g. on Linux' ext2fs, the maximum hard
link count is 65535, so you can't have more than 65533 subdirectories.]

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list