[GRASS-dev] Raster format idea

Thu May 25 17:45:29 EDT 2006

Cedric Shock wrote:

> > 2. Tiled storage would handle sparse maps better than row storage. How
> > it compared to quadtrees would depend upon the typical cluster size.
> 
> Here's an absolutely wacky idea for a storage mechanism. (I love it when I 
> shove together words or ideas that I don't completely understand).
> 
> Here's what it is:
> 
> Take the raster data. Split it up into tiles (Glynn is happy).
> Make a new set of raster data in which each pixel is the mean of 4 pixels in 
> the original map (Joel is happy).

That's a MIPmap. It only works if it's meaningful to take the mean of
several values, which isn't the case for discrete categories.

[And they're "cells", not "pixels"; GRASS maps aren't pictures ;) ]

A quadtree is similar except that you store whether or not the
sub-cells all have the same value (i.e. you're essentially recording
whether the variance is zero, rather than recording the mean).

With regard to the null bitmap, you would store a 1 if the entire
square represented by that cell was all-null, and a 0 if any part of
it was non-null. That would allow you to skip large chunks of all-null
data quickly.

> Actually, there's no reason except convenience that the resolution of a map 
> must be the same everywhere.
> 
> Here's a really ill-conceived data type (hard to read and write to files 
> because of the pointer):
> 
> struct quadtile {
> 	tilepixels * pixels;
> 	tilenulls * nulls;
> 	quadtile quads[4];
> }
> 
> Actually, the problem of pointers in files should be solvable with an in-file 
> heap. It should be possible (even routine?) to map a file to memory without 
> actually loading it, right? 

For storage, you basically have three choices:

1. Store "pointers" (i.e. offsets) to the data for each quadrant.
2. Store the data for each quadrant sequentially, following the parent.
3. Store the levels sequentially (i.e. a sequence of 2D arrays of
increasing resolution).

For option 1, you normally need to stop before you get to the level of
individual cells (e.g. the leaves are 4x4 or 8x8 blocks of cells),
otherwise all of the pointers to the leaves use too much space (even
assuming that you use relative offsets, which allows you to use fewer
bits at lower levels of the tree).

Option 2 provides the best compression, but it's only practical if you
are consuming the data by recursive descent; it's no use for any other
access strategy.

Option 3 takes up slightly more space than a raw array (the lowest
level /is/ the raw array), but random access is fast. There isn't any
compression, but you can easily detect entire blocks comprised of a
single value.

In most cases, single-level tiled storage will give you close to the
same performance with a lot less complexity.

-- 
Glynn Clements <glynn at gclements.plus.com>