[GRASS5] [bug #2380] (grass) unexpected '(' in r.univar

Hamish hamish_nospam at yahoo.com
Wed Apr 14 08:58:07 EDT 2004


> > while executing r.univar (v.1.7.2.2 2002/10/23) the errormessage
> > "unexpected '(' in line 18" appears. after changing the line
> > "function cleanup()" to "cleanup()" [the same in line 24] r.univar
> > runs without problems. Is it a bug or a cygwin-problem?


Yesterday I put together a small module written in C for calculating the
stats on the non-null cells of a raster map, filling pretty much the
same roll as r.univar (which I hadn't met before). I had been doing
something like 'r.to.sites | s.univar', which doesn't work for 5.7 of
course (which is why I wrote it).

Having r.univar makes this redundant of course, but I'll test them to
see how much of a speed difference there is (I take it the r.univar
script will spend a bit of time doing disk I/O). [result: C is at least
40% faster than shell script, and doesn't write 10s of MBs files to /tmp]


Two comments arise:  (without good answers)


a) Population vs. sample variance (& standard deviation)

r.series and r.univar use sum((xi-mean(x))^2)/n
   (i.e. population variance aka "sigma^2")

while 

s.univar and s.cellstats use sum((xi-mean(x))^2)/(n-1)
   (i.e. sample or bias-corrected variance aka "s^2")


For consistency we should pick one way & document it. The difference
between n and n-1 for big maps with huge numbers of cells isn't very
much, so this isn't too critical, but someone might need to do analysis
on very small/sparse maps one day.... I've used n-1, for no great reason
besides the current region is 'sample' of a larger location.
Can any stats people comment?


b) gmath library: I looked at using the c_var.c & co. functions from
r.series, but these require passing all input values (ie the whole map
in memory) at once, which while good for a general library function or
for n<1000 cells-of-the-same-coordinate like r.series or r.mapcalc might
use, it doesn't cut it for a 10000x10000 DCELL map. I guess I could use
c_sum.c to do one line at a time, but it doesn't seem worth it, and
doesn't get rid of any implementation inconsistencies (eg the n vs. n-1
problem above) which is the great benefit of using a gmath library.
So I just reimplemented in an inconsistent manner as described above.



If people are interested in a replacement to r.univar, I can clean it up
and add the missing extended functionality (quartiles, etc.) which
r.univar provides. I'm not looking forward to the sort.. so maybe I'll
leave that to a real programmer to do.

And the ageless question of what to call it?
  ideas: r.mapstats, r.univar2

shrug.



Hamish




More information about the grass-dev mailing list