[GRASS-dev] Raster format and dual function module

Trevor Wiens twiens at interbaun.com
Wed May 24 23:47:58 EDT 2006


On Thu, 25 May 2006 12:30:28 +1200
"Joel Pitt" <joel.pitt at gmail.com> wrote:

> Hi all,
> 
> I've been thinking abit about GRASS and the two things which I'd like
> to see change. However I *don't* think that they these things should
> be changed just for me and recognise the need to remain compatible
> with existing modules etc.
> 
> 1) Raster file format - Unless I'm mistaken, the raster is a flat
> compressed file, with NULL values stored uncompressed in a seperate
> bitmask. This means the NULL mask is quite large and also means the
> raster is split across files. Obviously this makes it fastish, but
> large and sparse maps are unnecessarily huge.
> 
> I've also been frustrated that one has to read an entire set of rows
> to find out whether there are any cells Non-null. This makes modelling
> processes slower then they need to be. I think that a quadtree format
> would solve this (lower resolution versions of an region overlayed on
> one another e.g. 128x128 64x64 32x32) by allowing you to descend only
> those tree branches that have raster cells present. Quadtrees would
> also speed up the process of displaying large raster maps on limited
> resolution monitors.
> 
> Quadtrees based rasters with integrated null bitmasks could be easily
> accessed through the normal raster function calls ensuring existing
> modules are interoperable. Only new modules wanting to use quadtree
> functions (such as checking existing of values at positions higher up
> the tree) woudl have to check the version of the GRASS raster.

Having had the pleasure of working with quadtrees under the old SPANS
system, I can certainly see the speed benefit. I've thought about this
in the past, but didn't know how it could be implemented such that
existing modules wouldn't be affected. Perhaps I'm a bit slow, but I
don't understand how your suggested method would allow existing modules
to function in a normal way. Could point me to some literature
or provide a bit more explanation please.

> 
> 2) Dual function modules - there is alot of talk about how we want to
> move forward with the GUI but still maintain the seperate programs for
> GRASS commands. I suggest having a compilation system that will create
> both standalone executables and integrated libraries. A GUI could then
> check whether a library version of GRASS module exists and load that,
> or otherwise use the equivalent executable.
> 
> This is due to another frustration I've had with speed of execution.
> While running long chains of commands in simulations I can't help
> realising that every command has to reload a map from disk and then
> write it back. As we all know, disk access is REALLY slow in
> comparison to memory etc. so if GRASS modules were compiled as
> libraries the last N (N being configurable) number of loaded maps
> could remain in memory for quick processing and display.
> 

I recall some time back when there were discussions about providing
much of the grass functionality as libraries one of the concerns raised
was about memory management. I believe it was Glynn who indicated that
most of the modules have been written under the assumption they were
free-standing and not part of an integrated whole, so the level of work
that has gone into memory management is not sufficient for use in a
library situation without major rewrites.

As I understand it, by providing a SWIG interface to some of the
backend libraries in GRASS it should be possible to write modules with
much more control than is currently available by stringing together
commands and thus could be potentially faster. Of this I'm not so sure,
so others will probably provide better answers on this one.

I personally feel pretty nervous about modules loading entire files
into memory unnecessarily. For example right now v.in.ascii is still
pretty slow at loading large files even when topology generation is
disabled (my assertion here is based on quoted numbers for files of
specified sizes compared to how long it took to load similarly large
point files on SPANS under OS/2 on a 486 years ago). Further in my work
developing an updated GUI for Stereo, I've noticed that without being
careful, it was possible to load entire images when it was not
necessary and slow the entire system to crawl. The issue in both cases
is one of well planned buffered reading and writing so that the system
doesn't go into thrash mode trying to load gigs of memory worth of data
but at the same time isn't spending inordinate amounts of time waiting
for the disk.

T
-- 
Trevor Wiens 
twiens at interbaun.com

The significant problems that we face cannot be solved at the same 
level of thinking we were at when we created them. 
(Albert Einstein)




More information about the grass-dev mailing list