[GRASS-dev] GRASS raster and G3d libraries

Thu Aug 10 21:34:54 EDT 2006

Soeren Gebbert wrote:

> > > I'm going to do some changes at the G3d library in the next months.
> > > I will rename all the lib functions to fit the GNU naming convention and 
> > > i will simultaneously extend the documentation.
> > > 
> > > But while reading some old grass-dev-mails, i noticed that there are plans to 
> > > merge the g3d and raster library.
> > > And that Glynn is currently rewriting the raster lib.
> > 
> > Just to clear up any confusion: raster I/O is part of libgis; the
> > "raster library" deals with raster graphics.
> 
> Sorry for the confusion, you are right. I meant the raster I/O part of libgis. 
> 
> > 
> > I have been rewriting the latter. I have given some consideration to
> > re-writing the raster I/O sections of libgis, including integrating 3D
> > volumes, but haven't actually done anything concrete, and don't plan
> > to in the near future.
> 
> That sounds promising. Can we create a wish list of wanted functionality? :)
> like:
> - a cache mode similiar to g3d
> - functions for random access in cache mode 
> and so on ...

Here's an outline of what I have so far:

The core feature would be the use of tiled storage, including caching
(replacing the rowio and segment libraries). This would make it
feasible to have very large maps (e.g. a single global map for GTOPO30
etc), and to perform semi-random access (you would need some degree of
locality if you didn't want to read the entire map into RAM, but you
wouldn't be limited to a strict top-to-bottom order).

Additional features would be a larger number of formats (1, 2 or 4
bits, or any number of bytes, per cell), with support for arbitrary
quantisation rules (e.g. a 16-bit raw value could represent values in
the range -1000 to +5553.6 in steps of 0.1).

The tile sizes would be constrained to powers of two, but the sizes of
both the stored tiles and those used by the application should be
user-definable.

The existing get/put row API would need to be retained, but a new
tile-based API would also be provided. The new API would make it
feasible to read maps using a GET(map,col,row) macro, which would
translate to something like:

#define GET(map,row,col) \
	((GET_BLOCK(map,(row)>>YSHIFT,(col)>>XSHIFT))[(row)&YMASK][(col)&XMASK])
#define GET_BLOCK(map,y,x) \
	(map->blocks[y][x] ? map->blocks[y][x] : read_block(map,y,x))

In the case of a cache hit, retrieving a value would equate to:

	map->blocks[(row)>>YSHIFT][(col)>>XSHIFT][(row)&YMASK][(col)&XMASK]

The compiler will be able to optimise some of this away (e.g. the
parts which depend upon the row don't change while processing a single
row). I'm not sure whether it would optimise the fact that
(col)>>XSHIFT tends to change infrequently.

The cached data for a map would consist of two tile maps: the raw data
stored on disc, and the rescaled and translated data presented to the
application.

Each tile map would be a 2D array of pointers to blocks, plus a
function capable of "generating" the data for any given block.

The generator for the lower-level map would read the data from the
file and decompress it. The generator for the higher-level map would
read data from the lower-level map, rescale it and decode it (i.e. 
integer -> FP conversions, reclassing etc).

Most of the design complexity arises from issues of efficiency versus
flexibility. For optimum efficiency, the block dimensions would need
to be compile-time constants, whereas allowing them to be set at run
time would be more flexible.

For the conversion step, there is the issue of whether to convert
first then rescale the converted data, or vice-versa. The optimal
strategy depends upon whether you are scaling up (where each source
cell will be copied to multiple destination cells) or scaling down
(where each source cell will be used at most once, with some being
skipped altogether).

-- 
Glynn Clements <glynn at gclements.plus.com>