[GRASS-dev] Raster format and dual function module

Glynn Clements glynn at gclements.plus.com
Thu May 25 06:37:12 EDT 2006


Joel Pitt wrote:

> I've been thinking abit about GRASS and the two things which I'd like
> to see change. However I *don't* think that they these things should
> be changed just for me and recognise the need to remain compatible
> with existing modules etc.
> 
> 1) Raster file format - Unless I'm mistaken, the raster is a flat
> compressed file, with NULL values stored uncompressed in a seperate
> bitmask. This means the NULL mask is quite large and also means the
> raster is split across files. Obviously this makes it fastish, but
> large and sparse maps are unnecessarily huge.

Actually, it makes it quite slow; the null file is opened and closed
for every line. OTOH, keeping the null file open would double the
number of descriptors used, halving the maximum number of maps which
can be opened at a time; this can be an issue for r.series if you are
working with a year's worth of daily samples (i.e 365 maps).

It would be much better to just embed the nulls into the raster file.

> I've also been frustrated that one has to read an entire set of rows
> to find out whether there are any cells Non-null. This makes modelling
> processes slower then they need to be. I think that a quadtree format
> would solve this (lower resolution versions of an region overlayed on
> one another e.g. 128x128 64x64 32x32) by allowing you to descend only
> those tree branches that have raster cells present. Quadtrees would
> also speed up the process of displaying large raster maps on limited
> resolution monitors.

There are simpler ways to solve some of these problems. E.g.

1. It wouldn't be particularly hard to add an alternatve to
G_get_raster_row() which doesn't bother to read all-null rows, but
just sets a flag to indicate that the row is all null.

2. Tiled storage would handle sparse maps better than row storage. How
it compared to quadtrees would depend upon the typical cluster size.

> Quadtrees based rasters with integrated null bitmasks could be easily
> accessed through the normal raster function calls ensuring existing
> modules are interoperable. Only new modules wanting to use quadtree
> functions (such as checking existing of values at positions higher up
> the tree) woudl have to check the version of the GRASS raster.
> 
> 2) Dual function modules - there is alot of talk about how we want to
> move forward with the GUI but still maintain the seperate programs for
> GRASS commands. I suggest having a compilation system that will create
> both standalone executables and integrated libraries. A GUI could then
> check whether a library version of GRASS module exists and load that,
> or otherwise use the equivalent executable.
> 
> This is due to another frustration I've had with speed of execution.
> While running long chains of commands in simulations I can't help
> realising that every command has to reload a map from disk and then
> write it back. As we all know, disk access is REALLY slow in
> comparison to memory etc. so if GRASS modules were compiled as
> libraries the last N (N being configurable) number of loaded maps
> could remain in memory for quick processing and display.

Unix caches disk accesses. If you have enough RAM, you'll never need
to actually read the same data from disk twice.

It's more likely that performance issues stem from the various
processes which are performed on the data between it being read from
the file and passed to the application (see lib/gis/get_row.c; I made
some diagrams of this, if you're interested). Opening and closing the
null bitmap for each line of input is known to be a significant
performance sink.

> Let me know what people think. Obviously I don't expect other people
> to go ahead with this just because I would like to see it, but if
> people see some value in these approaches I could attempt to map out a
> course and contribute some time to it.

I've been thinking about a new raster architecture for a while, but I
still don't have anything concrete.

For the time being, it would be better to see if we can improve the
situation with some minor changes to a few key areas. It would help if
someone has the time to build GRASS with profiling support and
actually profile some common usage patterns.

-- 
Glynn Clements <glynn at gclements.plus.com>




More information about the grass-dev mailing list