[GRASS-dev] Raster format and dual function module

Glynn Clements glynn at gclements.plus.com
Wed Jun 21 08:11:09 EDT 2006


Glynn Clements wrote:

> If it turns out that the separate null file is a signficant
> performance issue, we need to consider a migration plan for embedding
> nulls (e.g. if 6.3 can write out rasters with embedded nulls, do we
> need 6.2 to be able to read them?).

I've compiled GRASS with profiling support, and a quick glance at the
results indicates that the null handling is indeed significant. E.g. 
for "r.resample in=elevation.dem ...", G_get_raster_row() accounts for
30.2% of the time taken, with embed_nulls() taking 22.5%, which means
that embed_nulls() accounts for 75% of G_get_raster_row().

[FWIW, that 22.5% is split roughly evenly between G_is_null_value()
(11.1%) and get_null_value_row() (10.2%, of which 6.9% is in
G__check_null_bit()).]

Another interesting point; from the flat profile (i.e. time attributed
to calls does not include time spent in children):

 13.10  G_is_c_null_value
 11.78  G_is_d_null_value
  5.89  G_is_null_value

IOW, 30.77 of the total time is spent testing whether cells are null.

Regarding the first two: these should be available as macros or inline
functions, and they should be optimised. These functions amount to
comparing two 32- or 64-bit values, and should be trivial.

Regarding the third:

	int G_is_null_value (const void *rast, RASTER_MAP_TYPE data_type)
	{
	    switch(data_type)
	    {
	      	case CELL_TYPE:
		    return (G_is_c_null_value((CELL *) rast));
	        
		case FCELL_TYPE:
		    return (G_is_f_null_value((FCELL *) rast));
	        
		case DCELL_TYPE:
		    return (G_is_d_null_value((DCELL *) rast));
	        
		default:
		    G_warning("G_is_null_value: wrong data type!");
		    return FALSE;
	    }
	}

That's nearly 6% of the program spent in a CELL/FCELL/DCELL switch
statement (the cost of the individual G_is_[cfd]_null_value() calls
isn't included in that figure). There are quite a few places where
this idiom is used (e.g. lib/gis/raster.c).

This suggests that simple functions taking a RASTER_MAP_TYPE argument
and operating upon individual cells should be avoided where possible. 
Instead, there should be a separate row-processing loop for each data
type, so that the switch statement(s) are only executed once per row,
not once per cell.

-- 
Glynn Clements <glynn at gclements.plus.com>




More information about the grass-dev mailing list