[GRASS5] [bug #2380] (grass) unexpected '(' in r.univar

Wed Apr 14 09:40:17 EDT 2004

On Thu, Apr 15, 2004 at 12:58:07AM +1200, Hamish wrote:
> > > while executing r.univar (v.1.7.2.2 2002/10/23) the errormessage
> > > "unexpected '(' in line 18" appears. after changing the line
> > > "function cleanup()" to "cleanup()" [the same in line 24] r.univar
> > > runs without problems. Is it a bug or a cygwin-problem?
> 
> 
> Yesterday I put together a small module written in C for calculating the
> stats on the non-null cells of a raster map, filling pretty much the
> same roll as r.univar (which I hadn't met before). I had been doing
> something like 'r.to.sites | s.univar', which doesn't work for 5.7 of
> course (which is why I wrote it).
> 
> Having r.univar makes this redundant of course, but I'll test them to
> see how much of a speed difference there is (I take it the r.univar
> script will spend a bit of time doing disk I/O). [result: C is at least
> 40% faster than shell script, and doesn't write 10s of MBs files to /tmp]

I'm hoping for years to see r.univar implemented in C...

> 
> Two comments arise:  (without good answers)
> 
> 
> a) Population vs. sample variance (& standard deviation)
> 
> r.series and r.univar use sum((xi-mean(x))^2)/n
>    (i.e. population variance aka "sigma^2")
> 
> while 
> 
> s.univar and s.cellstats use sum((xi-mean(x))^2)/(n-1)
>    (i.e. sample or bias-corrected variance aka "s^2")
> 
> 
> For consistency we should pick one way & document it.

Yes: and move into gmath. Please use doxygen-style comments.

> The difference
> between n and n-1 for big maps with huge numbers of cells isn't very
> much, so this isn't too critical, but someone might need to do analysis
> on very small/sparse maps one day.... I've used n-1, for no great reason
> besides the current region is 'sample' of a larger location.
> Can any stats people comment?
> 
> 
> b) gmath library: I looked at using the c_var.c & co. functions from
> r.series, but these require passing all input values (ie the whole map
> in memory) at once, which while good for a general library function or
> for n<1000 cells-of-the-same-coordinate like r.series or r.mapcalc might
> use, it doesn't cut it for a 10000x10000 DCELL map. I guess I could use
> c_sum.c to do one line at a time, but it doesn't seem worth it, and
> doesn't get rid of any implementation inconsistencies (eg the n vs. n-1
> problem above) which is the great benefit of using a gmath library.
> So I just reimplemented in an inconsistent manner as described above.
> 
> 
> 
> If people are interested in a replacement to r.univar, I can clean it up
> and add the missing extended functionality (quartiles, etc.) which
> r.univar provides. I'm not looking forward to the sort.. so maybe I'll
> leave that to a real programmer to do.
> 
> And the ageless question of what to call it?
>   ideas: r.mapstats, r.univar2

IMHO we should aim at *replacing* code, not adding similar
code with different names. The same applies to various
other modules such as r.grow[2] etc. Confusions grows...

See also:
http://intevation.de/rt/webrt?serial_num=1848&display=History
"merge of r.average, r.median and r.mode"

Markus