[GRASS-dev] Raster format and dual function module
Glynn Clements
glynn at gclements.plus.com
Wed Jun 21 08:11:09 EDT 2006
Glynn Clements wrote:
> If it turns out that the separate null file is a signficant
> performance issue, we need to consider a migration plan for embedding
> nulls (e.g. if 6.3 can write out rasters with embedded nulls, do we
> need 6.2 to be able to read them?).
I've compiled GRASS with profiling support, and a quick glance at the
results indicates that the null handling is indeed significant. E.g.
for "r.resample in=elevation.dem ...", G_get_raster_row() accounts for
30.2% of the time taken, with embed_nulls() taking 22.5%, which means
that embed_nulls() accounts for 75% of G_get_raster_row().
[FWIW, that 22.5% is split roughly evenly between G_is_null_value()
(11.1%) and get_null_value_row() (10.2%, of which 6.9% is in
G__check_null_bit()).]
Another interesting point; from the flat profile (i.e. time attributed
to calls does not include time spent in children):
13.10 G_is_c_null_value
11.78 G_is_d_null_value
5.89 G_is_null_value
IOW, 30.77 of the total time is spent testing whether cells are null.
Regarding the first two: these should be available as macros or inline
functions, and they should be optimised. These functions amount to
comparing two 32- or 64-bit values, and should be trivial.
Regarding the third:
int G_is_null_value (const void *rast, RASTER_MAP_TYPE data_type)
{
switch(data_type)
{
case CELL_TYPE:
return (G_is_c_null_value((CELL *) rast));
case FCELL_TYPE:
return (G_is_f_null_value((FCELL *) rast));
case DCELL_TYPE:
return (G_is_d_null_value((DCELL *) rast));
default:
G_warning("G_is_null_value: wrong data type!");
return FALSE;
}
}
That's nearly 6% of the program spent in a CELL/FCELL/DCELL switch
statement (the cost of the individual G_is_[cfd]_null_value() calls
isn't included in that figure). There are quite a few places where
this idiom is used (e.g. lib/gis/raster.c).
This suggests that simple functions taking a RASTER_MAP_TYPE argument
and operating upon individual cells should be avoided where possible.
Instead, there should be a separate row-processing loop for each data
type, so that the switch statement(s) are only executed once per row,
not once per cell.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list