[GRASS5] [bug #2380] (grass) unexpected '(' in r.univar
hamish_nospam at yahoo.com
Wed Apr 14 08:58:07 EDT 2004
> > while executing r.univar (v.22.214.171.124 2002/10/23) the errormessage
> > "unexpected '(' in line 18" appears. after changing the line
> > "function cleanup()" to "cleanup()" [the same in line 24] r.univar
> > runs without problems. Is it a bug or a cygwin-problem?
Yesterday I put together a small module written in C for calculating the
stats on the non-null cells of a raster map, filling pretty much the
same roll as r.univar (which I hadn't met before). I had been doing
something like 'r.to.sites | s.univar', which doesn't work for 5.7 of
course (which is why I wrote it).
Having r.univar makes this redundant of course, but I'll test them to
see how much of a speed difference there is (I take it the r.univar
script will spend a bit of time doing disk I/O). [result: C is at least
40% faster than shell script, and doesn't write 10s of MBs files to /tmp]
Two comments arise: (without good answers)
a) Population vs. sample variance (& standard deviation)
r.series and r.univar use sum((xi-mean(x))^2)/n
(i.e. population variance aka "sigma^2")
s.univar and s.cellstats use sum((xi-mean(x))^2)/(n-1)
(i.e. sample or bias-corrected variance aka "s^2")
For consistency we should pick one way & document it. The difference
between n and n-1 for big maps with huge numbers of cells isn't very
much, so this isn't too critical, but someone might need to do analysis
on very small/sparse maps one day.... I've used n-1, for no great reason
besides the current region is 'sample' of a larger location.
Can any stats people comment?
b) gmath library: I looked at using the c_var.c & co. functions from
r.series, but these require passing all input values (ie the whole map
in memory) at once, which while good for a general library function or
for n<1000 cells-of-the-same-coordinate like r.series or r.mapcalc might
use, it doesn't cut it for a 10000x10000 DCELL map. I guess I could use
c_sum.c to do one line at a time, but it doesn't seem worth it, and
doesn't get rid of any implementation inconsistencies (eg the n vs. n-1
problem above) which is the great benefit of using a gmath library.
So I just reimplemented in an inconsistent manner as described above.
If people are interested in a replacement to r.univar, I can clean it up
and add the missing extended functionality (quartiles, etc.) which
r.univar provides. I'm not looking forward to the sort.. so maybe I'll
leave that to a real programmer to do.
And the ageless question of what to call it?
ideas: r.mapstats, r.univar2
More information about the grass-dev