[GRASS-dev] [RFC] Glossary: GISDATABASE -> DATASTORE

tlaronde at polynum.com tlaronde at polynum.com
Sun Mar 4 08:41:32 EST 2007


Hello,

I plan to make the following changes in the variables names in KerGIS.
Since we do share the same origin and since it will be suboptimal to
have distinct naming schemes, I'd like to hear your reactions about the 
following choices.

Thanks in advance.


Glossary proposal for gis "database" naming scheme
--------------------------------------------------

A database is an organized set of logically related data. A table is a
set of instances of elements composed of a fixed number of sub-elements
(fields).

Historically, in CERL GRASS derived systems, GISDATABASE has been the
name of the variable holding the value of the pathname to a directory
where GRASS data is stored.

There is absolutely no question whether some parts of the gis data
stored are databases: there are. But this is not what is designated by
GISDATABASE. The real database level is the LOCATION, where all the data
is logically related at least by the REGION. The MAPSETs are the tables.

Hence the use of GISDATABASE is misleading in several ways:

1) The directory is not a database (databases are sub-directories);

2) This is not _the_ GISDATABASE since there may be many ones (contrary
to _the_ GISBASE where _this_ version of the gis system is put);

3) Since, for non geometrical attributes, other types of RDBMs are used,
there is some confusion or at least some implicit assumptions about what
a "database" is.

Hence I propose to replace GISDATABASE by: DATASTORE.

Not GISDATASTORE to avoid the assumption about the uniqueness (the
GISBASE, but one of several DATASTORE; we are in the gis, so this shall
be taken for granted; furthermore GISDATABASE is a gis environment
variable, not a system host one [contrary to GISBASE]; so this make
sense).

The SQL terminology (even if the CERL GRASS databases [LOCATIONs] have
nothing to do with SQL, but _are_ databases, I looked for "prior art"):
catalog is not widely used, and, for me, does not hold the correct
meaning: a catalog is the listing of what is in a store, not the store
itself.

The comparison with the SQL terminology, and a RDBM (PostgreSQL) can
share some supplementary light (this is a naming comparison; it should
not be pushed to far):

SQL				PostgreSQL				GRASS/KerGIS
======================================================

cluster			cluster						cluster

catalog			database cluster			DATASTORE

schema			database					LOCATION

object			tables/views/routines		MAPSET
======================================================

Note1: someone wrote that one of the "out of fashion" aspect of CERL
GRASS derived systems was, with the use of not in the mood programming
languages, the fact that the data is stored in a file hierarchy. Well
that is exactly how PostgreSQL, for example, does it and I do not see
why this should be plagued as a bad choice (for example allowing to
dedicate some chunk of a disk with a size allowing backup [matching
backup capabilities], and using Unix access permissions, or, if usable,
ACLs to manage access).

Note 2: in CERL GRASS, the gis environment variable LOCATION_NAME was
used for the "location", while LOCATION was set to the full path. I have
found that indeed, for users, LOCATION should be set to the location
name, symetric with the use of MAPSET. The full pathname is only used in
scripts, since the gis has dedicated functions to precisely find a name
in its databases or tables (locations and mapsets). Any comments?
-- 
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C




More information about the grass-dev mailing list