[GRASS-dev] Re: what if: a new GRASS directory layout?

Ivan Shmakov ivan at theory.asu.ru
Wed Apr 9 00:25:04 EDT 2008


>>>>> Glynn Clements <glynn at gclements.plus.com> writes:

 >>> I initially thought that you would need some form of locking, to
 >>> prevent the map from being replaced in the middle of the sequence.
 >>> However, you could achieve the same result by caching the inventory
 >>> within the module, but the code which garbage-collects unreferenced
 >>> elements would need to allow for this.

 >> Yes.  The need for GC seems to be the weekest point of this scheme.

 >> On Unix, as a first approximation, I'd just open () every binary
 >> object that's referenced by the inventory being processed.  This
 >> way, even if the file loses its name, it would be available to the
 >> program.

 > However: while updates to the cell/fcell file are atomic (new data is
 > written to a temporary file which is rename()d upon closing), support
 > files are typically updated by simply opening the output file for
 > write. If the output file already exists, it is overwritten in place.

 > Obviously, that would need to change in order for "transactional" I/O
 > to work.

	Yes.  With the new scheme, objects/ may be created in place, but
	musn't ever be overwritten.

 > The fact that individual modules can create their own private files
 > in cell_misc complicates matters.

 > BTW, the definition of "support file" includes the cellhd file, which
 > is rather fundamental, as the layout of the cell/fcell file (rows,
 > columns, format, compression) is stored in the cellhd file. If the
 > contents of the cellhd file don't match the cell/fcell file, libgis
 > will probably just crash.

	Agreed.

 >> In general, every binary object would need a list of references.
 >> Maintaining a list of names of referencing rasters shouldn't be
 >> too hard to implement.  On the contrary, a list of PIDs (to
 >> allow for a raster to be referenced by a process) looks a bit
 >> fragile.

 > In-process references could be maintained by making a copy (or hard
 > link) to the inventory, so that the GC treats it as "live". You would
 > need some kind of clean-up mechanism to handle any copies which are
 > left behind if a module crashes.

	However, having GC to process all the inventories won't be
	efficient (unless these are stored in a database's table with
	appropriate indices.)  So, I had in mind keeping a references
	file along with each object file.

	But well, creating a temporary inventory to hold all the objects
	may help, e. g.:

$ cat tmp/refs/r.mapcalc-26528.1
References: sO7dZ3p0hlA6iQGMN, EwqVK4sVoFq1bFK7y, KkUET1RdWlwXQxosV,
 ajK3kbLfQu3a4Osuq, 8isdA0FB3GCmP15JV, qJaJuz2k7hJKMvRIK
$ cat objects/sO7dZ3p0hlA6iQGMN.refs
2005-04-25T19+0000-mod09-reflectance-250m-band-1
tmp/refs/r.mapcalc-26528.1
tmp/refs/r.univar-8974.1
$ 

	There's still a locking issue regarding the `.refs' -- it needs
	to be handled if multiple processes try to update the file
	concurrently.  Any ideas on how to implement it portably?
	(Perhaps it will be worth looking into, e. g., the SQLite
	source.)

 >>> Even there, you could run into problems where a module invokes
 >>> another module; the child would need to use the same version of the
 >>> map as the parent.

 >> Agreed.

 >> Actually, the only proper solution to this problem that I know
 >> is moving the whole computation chain into a ``parallel
 >> existence'' -- forking a separate copy-on-write location at the
 >> beginning of a ``transaction block'', and merging it back when
 >> it's done.  And while I hope that something like this will
 >> eventually be available in GRASS, I probably wouldn't say that
 >> the current code base is anywhere near that.

 > Yep. For the time being, I'd settle for simply re-arranging the
 > database layout to have one directory per map.

 > [BTW, it has been pointed out that this can reduce the maximum number
 > of maps per mapset, as the limit on an inode's hard link count limits
 > the maximum number of subdirectories, while there is usually no fixed
 > limit on the number of files. E.g. on Linux' ext2fs, the maximum hard
 > link count is 65535, so you can't have more than 65533 subdirectories.]

	While the inventory scheme is free from hitting this limit.



More information about the grass-dev mailing list