[GRASS5] GRASS 5.1 and new raster directory structure proposal

Mon Jan 20 05:05:05 EST 2003

Markus Neteler wrote:

> My proposal is to change the 5.0 raster file structure for 5.1.
> 
> A raster file organization similar to above structure by:
>  maptype/mapname/files
> 
> offers following advantages:

I agree with the concept. However, the key issue is the programming
interface. All access to files within the GRASS database should
ultimately go through a few core functions, e.g. G__find_file(); in
that situation, the actual directory layout should be irrelevant to
anything other than those core functions.

AFAICT, the lowest level function should probably look like:

	G__file_name(gisdbase, location, mapset, type, name, element);

Any higher-level interfaces should ultimately go through here.

The most obvious higher-level interface would be one which accepts a
combined mapset/name; this would allow e.g. changing the syntax of
qualified names from "map at mapset" to "mapset/map", or eliminating
mapsets altogether. Certainly, the logic of handling qualified names
should be in one place rather than dotted around the code.

Closely related to this is the way that modules currently handle
qualified names. At present, modules use G_find_file() to split a
(possibly qualified) map name into separate mapset/name components,
then pass the components separately.

This should be changed, IMHO; a module should treat a map name as an
abstract identifier, and shouldn't have to even know about mapsets
(apart from the obvious exceptions, e.g. g.mapsets).

The main requirement here is for specific functions to generate map
names based upon an existing name, coupled with some context. At
present, individual modules basically perform string manipulation
operations (concatenation, parsing) upon the strings which represent
maps and mapsets.

To give some concrete examples:

1. If a module requires several output maps, it may wish to allow the
user to just specify a "base" name; e.g. d.rgb might want to allow the
user to enter:

	d.rgb input=foo

instead of (at present);

	d.rgb r=foo.r g=foo.g b=foo.b

However, for a qualified map name, entering:

	d.rgb input=foo at bar

would need to be treated as:

	d.rgb r=foo.r at bar g=foo.g at bar b=foo.g at bar

and *not* as:

	d.rgb r=foo at bar.r g=foo at bar.g b=foo at bar.g

It would need to be able to do this without hard-coding the
"map at mapset" convention into the module itself.

2. Similarly, if a module generates multiple output maps from a single
input map, it may wish to (by default) derive the names of all of the
output maps from the name of the input map. In this case, the output
names would need to be unqualified even if the input name was
qualified. So, e.g. r.slope.aspect might wish to treat:

	r.slope.aspect elevation=foo at bar

as equivalent to:

	r.slope.aspect elevation=foo at bar slope=foo.sl aspect=foo.as

Again, one would wish to avoid hard-coding the "map at mapset" convention
into the module itself.

One problem with the general concept of channeling file access through
a few key functions is the issue of scripts. Typically, these end up
re-implementing the libgis logic; moreover, each individual script
ends up with its own clone of the code.

Witness the effort involved in replacing references to $LOCATION with
g.gisenv. A similar effort may be required to handle any changes to
the layout below the level of th mapset directory.

For this reason, we should also consider providing standard Bourne
shell and/or Tcl equivalents of the libgis functionality. This could
be a set of standard "include" scripts, which would be accessed by
e.g.

	source "$GISBASE/scripts/library.sh"
or:
	source $env(GISBASE)/scripts/library.tcl

possibly in combination with some standard utilities (e.g. 
g.file.name) which would "export" the core functions in a way that can
be used with scripts (although, for Tcl, it may be preferable to
provide either a customised tclsh or a loadable module).

To deal with a couple of specific points from the original message:

> - raster/vector/G3D maps with same names are still possible

I'd suggest that it's worth some thought as to whether this is a good
or a bad thing. The main argument against is that it could result in
confusion. I can recall one bug report which hinted at the potential
scenario of:

a) executing "g.remove foo", then
b) thinking "Oh, F***! I meant the vector map, not the raster map".

[A tenuously-related aside: The MS-Windows file browser has an option
to "Hide extensions for known file types". I'm sure that whoever
designed this option either never had to provide technical support by
telephone, or was actively seeking revenge against those who do. 
Instead of simply saying "Now, click on the file named foo.txt", you
end up with "no, not *that* file named foo, but the one named foo with
a picture that looks like ...".]

> - during this change/cleanup the 'white space" issues could be fixed
>   (especially for MS-Windows users) as relevant functions are touched

AFAICT, this problem is quite separate to the layout below the
database directory.

Most of the actual problems which have been reported to date come down
to whitespace in directory names outside of the database directory. 
The most common issue is $HOME; after that comes the issue of
importing files ([rsv].in.*) from directories which contain
whitespace.

However, we should also try to allow for the possibility of both
$GISBASE and the path to the database directory containing whitespace. 
IMHO, it ought to be possible to install GRASS itself, into e.g. 
"C:\Program Files", and to store databases wherever is deemed
appropriate (which may be in a directory containing spaces).

The vast majority of problems are due to the use of a shell, whether
due to:

a) scripts, or
b) the system() function.

The first one is quite hard to solve completely, although using quotes
helps. The second one is straightforward to solve: don't use system().
Instead, G_system() should be modified to take individual arguments,
i.e. from:

	int G_system (char *command)
to:
	int G_system (char *program, ...)

and to execute the program directly instead of via /bin/sh. So,
instead of:

	G_system("r.foo arg1=foo arg2=bar");

one would use:

	G_system("r.foo", "arg1=foo", "arg2=bar", NULL);

> Potential disadvantages:
> 
> - at least some raster modules have to be modified which directly access
>   file in the user's (current) mapset
>   Comment: with exceptions such modules *should* use library functions to
>            access files and should be cleaned anyway

AFAICT, this is the *only* disadvantage. IOW, the changes are
absolutely desirable; they just involve work. Most significantly, they
require that we think about the design before re-writing a large
number of modules, so that we don't discover a fundamental problem at
a late stage and end up having to re-re-write the modules.

> - handling of 'colr2/' directory (user applies color table to map which
>   is stored in another mapset) [1] and 'reclassed_to' file handling must be
>   modified
>   Comment: at least the reclassed_to' file handling was discussed earlier to
>            have some disadvantages in the current implementation and might
>            be updated/modified anyway

Well, the reclassed_to concept has fundamental problems in any case. 
The colr2 issue isn't that significant; IMHO, it wouldn't be an
overwhelming problem if that functionality was simply discarded until
we figured out how to do it right.

> [1] A 'colr2/' directory related suggestion from Glynn Clements:
> 
> > Rather than having a special-case mechanism which allows an alternate
> > colour table to be "overlaid" onto an existing map (possibly in a
> > different mapset), it would be preferable, IMHO, to create a
> > "recolour" map. This would work like a reclass map; the "recolour" map
> > would exist as an actual map as far as the user is concerned, but all
> > of the data (except for the colour table) would be taken from the base
> > map.
> > 
> > There would probably be other uses for such a mechanism (e.g. category
> > labels, horizontally or vertically rescaled maps etc).

I'll elaborate slightly on this idea.

Basically, you would be able to create a map which was essentially a
"link" to an existing map. The actual data would initially be a map
directory containing a single file, which contained the (possibly
qualified) name of another map.

Any access to a file within this directory (e.g. via G__find_file())
would result first in an attempt to locate the file within the
directory itself; if the file didn't exist there, the function would
then attempt to locate the file in the original directory. Creating or
modifying a file would always write the file into the new directory.

[Aside: this implies that we need separate "find file" functions for
read and write.]

Reclass maps would then just be a specific case of the general
mechanism. Any map could contain a reclass table. A typical reclass
map would just consist of a link to the original map and a reclass
table. The reclass table would be a separate file rather than a
special type of cell_hd file; it wouldn't need the name/mapset header,
as this would be in the "link" file.

A map could be "recoloured" by creating a linked map with only a
"colr" file. Similarly, a map could be "relabelled" by creating a
linked map with only a "cats" file.

> Hoping for a fruitful discussion,

I hope so. IMHO, GRASS' future viability entirely depends upon the
amount of effort people are willing to expend upon core design issues. 
That will determine whether GRASS is a coherent package or a few
hundred disparate programs dumped into a directory.

-- 
Glynn Clements <glynn.clements at virgin.net>