[GRASS-dev] random issues and bug

Fri Aug 24 00:49:22 EDT 2007

Hamish wrote:

> sorry this is a bit disorganized, but there are several issues.
> But there is a common theme. :)
> 
> 
> r.malcalc rand(a,b) for integer values of a,b
> ====================
> 
> For a cellular automata seed I was trying to make a random 0/1 base map.
> 
> G63> r.mapcalc "rand.boolean = rand(0,1)"
> G63> r.info -r rand.boolean
> min=0
> max=0
> 
> G63> r.mapcalc "rand.boolean = rand(0,2)"
> G63> r.info -r rand.boolean
> min=0
> max=1
> 
> ?!
> I assume what is happening here is that the range for rand(0,2) is 0.0 to
> 1.999999 and the conversion to integer (CELL map) is done by a cast which
> throws away the least significant bits. But to the casual user this seems
> very counter intuitive.

There is no cast; the rand() function has separate CELL/FCELL/DCELL
versions.

> r.mapcalc manual says:
>     rand(a,b)               random value x : a < x < b
> 
> ... but that's only true if a,b are floats. For integer values it is more
> like: a <= x <= b-1

The manual page is wrong. The code (intentionally) calculates values x
such that "a <= x < b".

I've fixed this in CVS.

> My first attempt was:
> G63> r.mapcalc "rand.boolean = int( rand(0,1) + 0.5 )"
> 
> but that made an all 0s map as rand(n,n+1) == n as mentioned above.
> (I didn't expect the cast to int)

"0" and "1" are integer literals, so the above will use the integer
version of rand().

> In hindsight I can get that method to work by passing rand(,) two floats:
> 
> G63> r.mapcalc "rand.boolean = int( rand(0.0,1.0) + 0.5 )"
> G63> r.info -r rand.boolean
> min=0
> max=1

That's inefficient, but it will work.

Ultimately, if you want 0 or 1, use rand(0,2).

> Bug: r.mapcalc rand() not acting randomly.  (in at least 5.4 - 6.3cvs)
> ===========================================
> 
> #simple_xy location
> BOXSIZE=512
> export GRASS_WIDTH=$BOXSIZE
> export GRASS_HEIGHT=$BOXSIZE
> d.mon x0
> g.region n=$BOXSIZE s=0 w=0 e=$BOXSIZE res=1
> 
> r.mapcalc "rand.boolean = rand(0,2)"  # make a map of 0s and 1s
> d.rast rand.boolean
> 
> screenshot:  (1:1 region rows/cols to image rows/cols)
>   http://bambi.otago.ac.nz/hamish/grass/bugs/rmapcalc_rand_int_band_63cvs.png
> 
> vertical bands, the rows have a repeating pattern to them.
> 
> weird-- in GRASS 6.0.2 each row is one cell shorter and the pattern
> drifts to the left (box is 1:1 so ends up in opp corner).
> In 5.4.0 it's the same as both 6.3cvs and 6.2.2, ie vertical bands.
>   http://bambi.otago.ac.nz/hamish/grass/bugs/rmapcalc_rand_int_band_602.png
>   http://bambi.otago.ac.nz/hamish/grass/bugs/rmapcalc_rand_int_band_540.png
> 
> 
> My first guess was that this arrived with the addition of $GRASS_RND_SEED
> and the seed was reset to the same value at the start of each row when it
> should really only be set once at the start of the module.

It is only set once at the start of the module.

> After short reflection that's not the case (it's not a smear) but there
> certainly is a repeating pattern. Aliasing effect? Resizing the xmon
> doesn't get rid of the pattern, it's in the cells. By setting the region
> to be one bigger than 512x512 the effect goes away:
> 
> g.region n=$((1+ $BOXSIZE)) s=0 w=0 e=$((1 + $BOXSIZE)) res=1
> r.mapcalc "rand.boolean = rand(0,2)"
> 
> 
> eh? power of 2 issue?

Yep; it's an artifact of the PRNG.

If you want "very" random numbers, use a Mersenne Twister, but it's
quite a bit slower than an LCG.

> r.surf.random
> =============
> 
> to back up a little, my first attempt to make the 0/1 map was with
> r.random, but that requires a base map and is for placing points. 
> Wrong module.
> 
> Next I tried r.mapcalc and ran into above issues.
> 
> Next I tried r.surf.random. It creates a DCELL map with data in the range
> of two integers.
> 
> One thing which struck me as odd is that r.surf.random creates a DCELL map
> from two integer command line options. If you create a map with floating
> points numbers, why not allow min and max to be any real number??

Probably overlooked when r.surf.random was extended from CELL to
DCELL.

> I've just now added an -i flag to r.surf.random to write out a CELL map.
> I made that be in the range of [min <= x <= max] for int, and for DCELL
> kept (min < x < max) [or whatever rand() gives]. As this doesn't match
> r.mapcalc behaviour (see above) I think we need to change either one or
> the other module for consistency's sake, then document the choice well.
> 
> My first attempt to add the -i flag was like:
>  *(row_out_C + col_count) = (CELL) floor(rand1(2742)*(max-min)+min +0.5);
> 
> but that under-represents first and last bin.
> e.g. for min=0 max=10 the 0 and 10 bins only have half the cells of 1-9.
> 
> G63> r.surf.random.MOD rsrand.i -i min=0 max=10
> G63> d.histogram rsrand.i
> G63> r.stats -c rsrand.i
>  100%
> 0 13032
> 1 26190
> 2 26142
> 3 26262
> 4 26336
> 5 26250
> 6 26171
> 7 26513
> 8 26262
> 9 25999
> 10 12987
> 
> as the "1" bin draws from the range of 0.5-1.49999 and the 0 bin only
> has half that space (0.0-0.49999) to draw from.
> 
> so I made the -i version as follows:
>   *(row_out_C + col_count) = (CELL) (rand1(2742)*(max+1-min)+min);
> 
> but that goes back to the question of should min=0, max=10 create 11 bins
> or 10?

10. 0 to 9, inclusive.

> For the integer case, I'd say include the end numbers; but that
> probably means changing r.mapcalc's rand(int,int) to be the same. ?????

r.mapcalc's behaviour is correct.

PRNGs invariably use start-closed/end-open intervals, i.e. a <= x < b.

More generally, if you partition the interval a-c into a-b and b-c,
you normally want b to be in one or the other but not both. 

> As for Numerical Recipes code in the module (or not), it uses rand1()
> from the gmath library.  lib/gmath/rand1.c

rand1() uses either rand() or drand48() from the standard C library. 
There is no Numerical Recipes code involved here.

> How to cite that in the r.surf.random.html help page? I put:
> "It uses the random number generator drand48() or rand(), depending on
> the user's platform."
> but I'd much rather be specific and say those come from the GNU Project
> or as defined in POSIX 1.1 or C99 std, or libc6 or whatnot, but don't
> know the correct answer.

rand() is C89, drand48() is POSIX.

> as for drand48, the man page says:
> NOTES
>    These functions are declared obsolete by SVID 3, which states
>    that rand(3) should be used instead.
> 
> So should we remove the "#if defined(HAVE_DRAND48)" parts from
> lib/gmath/rand1.c?

Probably.

ANSI C doesn't dictate the specific PRNG algorithm or the number of
bits, only that RAND_MAX is at least 32767. The end result is that
some rand() implementations are quite poor (I've even seen
implementations which take the *least* significant bits of an LCG,
which produces alternating even/odd values).

If lrand48() et al exist, they will always use a 48-bit state, which
puts a lower bound on the quality of the PRNG.

OTOH, if a system has lrand48(), it's likely that rand() will either
be an alias for lrand48() or something at least as good.

-- 
Glynn Clements <glynn at gclements.plus.com>