[GRASS-dev] what if: Anything series?

Fri Jan 11 15:02:09 EST 2008

			It seems impossible to doubt that everything in
			the universe can be represented by numbers [...]
				-- N. I. Lobachevsky

	Reading ``Time series in GRASS'' page [1], as well as [2, 3],
	made me wonder, is time the only parameter one may need to lay
	the data sets along of?  Arguably, it's not.

	Consider, for example, one having to compare the behaviour of
	MM5 modelling results with different models or parameters used.
	There, related rasters are laid along the model index or model
	parameter's value.

	Another example are the rasters comprising of the values of a
	meterological variable for certain (often non-uniformly spaced)
	values of pressure.  These sets of raster data sets shouldn't be
	turned into 3D rasters, since the pressure to height
	correspondence varies over space and time.

	The above makes me believe that the generic facility to keep the
	relations between the rasters is necessary.  Besides,
	implementing this facility allows for several other problems to
	be addressed within its framework, as I'd try to show below.

* Several related rasters: a rasterset?

	Both of the examples above suggested using numeric values to
	represent the relationship between the rasters.  These values
	can include:

	* timestamp (in seconds since epoch), allowing for time series
	  [1];

	* layer's pressure;

	* model index or model parameter;

	* were quality flags applied to the raster (1) or not (0)?

	Let me define rasterset as a named collection of related
	rasters, each unambiguously identified by an arbitrary number of
	the arbitrary numeric values.

	Below, I assume using 2D rasters at the lower level of the
	rasterset implementation, since 3D rasters could easily be
	simulated by a rasterset with a `z' as the parameter.

* Tiled raster storage

	The most simple case of using the rasterset facility is to
	implement tiled raster storage [3].

	Indeed, a tiled raster could be implemented with each tile
	becoming a raster within a single rasterset, and then being
	assigned a pair of numeric parameters -- the indices of the
	tile.

	Since the spatial resolution of the tile may differ (the rasters
	comprising the dataset are almost as independent as the
	individual rasters in GRASS currently), this allows for both the
	whole-NULL tiles (no raster for this tile indices), and for the
	same-value tiles (1x1 raster covering the whole region.)

	For the usage of this feature is supposed to be quite common, I
	believe it needs to be implemented at the ``core'' of the
	rasterset implementation, with the appropriate optimizations
	applied for some common cases.

* Metadata

	Since the rasters comprising the rasterset are allowed to carry
	an arbitrary number of additional numeric parameters, this
	facility could assume handling of certain (though not arbitrary)
	metadata, even in cases where these additional parameters aren't
	strictly necessary for the identification purposes.

	However, with each raster being assigned a category, it's
	possible to associate arbitrary information with it using a
	database connected to the rasterset.

* Color maps

	Color maps are currently tied rather closely to the rasters they
	are used for, making it hardly practical to use different color
	maps for the same rasters.  This feature could be used, for
	example, to apply different color maps when displaying the data
	and producing the printed output.

	Would the color maps be detached from the rasters, it may become
	feasible to allow for a color map to be shared among several
	rasters.

	I've already mentioned a raster's parameter as a possible
	substitute for `z' (both for simulating `z' for ordinary 3D
	rasters, and for storing layers of data for which layer index to
	`z' mapping varies over space and time.)  Moreover, for digital
	elevation models `z' value is actually the value stored in
	raster.  It may be worth investigated whether this relation
	could be turned inside out, to allow for arbitrary value to
	arbitrary value mappings be stored as 2D (or 1D) rasters within
	a rasterset.

	There may be demand for storing quite arbitrary arrays in the
	future as well.

* Scanning radiometers & Time

	Due to the curvature of the Earth surface, a satellite scanning
	radiometer such as MODIS sees certain places on Earth multiple
	times in a short period of time (about 1.48 s for MODIS.)

	These places appear on consequent scans on L2 data.  The most
	common practices to deal with this effect are either to average
	the values obtained for the same place, or to take the one value
	that is, after some criterion, superior to the other.

	However, allowing for the scans to be stored independently along
	with a ``time'' value associated with each would allow one to
	analyze these very short-term changes (if any.)

* RDBMS as the backend

	Probably the most appealing feature of the rasterset model is
	its supposed flexibility.  As mentioned above, the color maps
	could be represented as the rasters in their very own coordinate
	space, and so could be the ground control points (*).

	With the number of separate data structures to form a raster
	reduced, it could become feasible to put these structures into a
	general purpose RDBMS system, thus partially addressing both the
	disk space and the large number of files in a directory concerns
	[4].

	(*) It's very common for the satellite Level 2 data to specify
	the latitudes and longitudes for the centres of the pixels as
	the separate rasters.  These could be mapped directly to the
	specific rasters within the rasterset.  See [5] for a related
	feature in GDAL.

* Views

	The names aren't convenient for rasters.  For example, I have a
	location full of rasters with the names like:

2007-05-31-grans-std-qual-o3
2007-05-31-grans-std-toto3std
2007-05-31-grans-std-toto3std.qa
2007-05-31-grans-std-toto3stderr
...

	The total number of the 2D data sets for each day is over 70,
	most of which come in both the ``no quality flags applied'' form
	(without the `.qa' suffix) and the ``standard quality flags
	applied'' one (with one.)  And the source data do include even
	more data sets.

	In order to handle this amount of data efficiently the system
	should allow one to limit the namespace to the data sets
	matching arbitrary criterions.  I don't consider the GUI
	specifically, since it may become rather tedious to filter the
	g.mlist(1) output with grep(1) in scripts as well.

	The rasterset model seems to be a more appropriate solution.
	And, as suggested in the next section, there could be a way to
	name a specific raster within the rasterset.  With this
	functionality available from scripts, one could easily apply
	arbitrary schemes for naming the data sets.

* User interface

	With the rastersets being implemented, GRASS database becomes to
	look much more like a relational one.  Since the individual
	rasters are no longer named individually (rather, they share the
	common rasterset name and are identified by the associated
	values of the arbitrary parameters), to access a specific raster
	one would need to issue a query.  (Much like accessing a table's
	row with SQL queries.)

	Certainly, to expose the very exciting new features the
	rasterset model could offer, the UI (both the command line and
	the graphical parts) would require a major overhaul.  However,
	for the compatibility's sake, it's reasonable to implement the
	current raster accessing interface on top of the rasterset
	facility, thus allowing for the existing code (and therefore
	interface) to be retained.

	Then, there would have to be a mapping of the compatibility
	raster names to the (rastername, parameters) pairs, and the
	corresponding utilities to manage it, both in the library API
	and the UI, like:

GRASS> r.bind \
           raster=compat-airs-2007-05-31-total-ozone.qa \
           rasterset=airs-total-ozone \
	   parameter="timestamp=2007-05-31 21:35:24 +0000" \
	   parameter="qaflags_p=true"

	Parameters not specified are to be allowed to match all, but not
	any, of the rasters.  Thus, it won't be needed to specify the
	individual tile indices for the tiled rasters to mean the whole
	spatial extent of the rasterset.  If several rasters match the
	specification, but do not complement each other spatially, an
	error is signalled, like:

GRASS> r.bind \
           raster=dummy \
           rasterset=airs-total-ozone \
           parameter="qaflags_p=true"
r.bind: several rasters match the specification
GRASS> 

	Importing utilities (r.in.gdal, or r.import) would need to be
	changed early to allow for both the rasterset name and the
	identifying parameters to be specified.  The other modules could
	be changed as the time permits.

	The rasters imported may be automatically named according to an
	arbitrary user-specified scheme with the ``hooks'' facility
	being implemented in GRASS.  (I hope to present my ideas
	regarding such a facility in a separate posting.)

* Notes for the implementor

	The model described above could be based on the current 2D
	rasters implementation after cleaning it of the extra features
	to be provided by the rasterset model itself.

	Within the model, the 2D rasters facility is the lower level,
	and its interface would need to be changed.  For the
	compatibility's sake, the former interface would need to be
	provided by the code layered on top of the rasterset
	implementation.

	The rasterset model is to be implemented mostly from scratch.

[1] http://grass.gdf-hannover.de/wiki/Time_series_in_GRASS
[2] http://grass.gdf-hannover.de/wiki/GRASS_7_ideas_collection
[3] http://grass.gdf-hannover.de/wiki/Replacement_raster_format
[4] http://freegis.org/cgi-bin/viewcvs.cgi/grass/gips/gip-0002.txt?rev=HEAD&co
ntent-type=text/vnd.viewcvs-markup
[5] http://trac.osgeo.org/gdal/wiki/rfc4_geolocate