[GRASS-dev] random issues and bug
Glynn Clements
glynn at gclements.plus.com
Fri Aug 24 00:49:22 EDT 2007
Hamish wrote:
> sorry this is a bit disorganized, but there are several issues.
> But there is a common theme. :)
>
>
> r.malcalc rand(a,b) for integer values of a,b
> ====================
>
> For a cellular automata seed I was trying to make a random 0/1 base map.
>
> G63> r.mapcalc "rand.boolean = rand(0,1)"
> G63> r.info -r rand.boolean
> min=0
> max=0
>
> G63> r.mapcalc "rand.boolean = rand(0,2)"
> G63> r.info -r rand.boolean
> min=0
> max=1
>
> ?!
> I assume what is happening here is that the range for rand(0,2) is 0.0 to
> 1.999999 and the conversion to integer (CELL map) is done by a cast which
> throws away the least significant bits. But to the casual user this seems
> very counter intuitive.
There is no cast; the rand() function has separate CELL/FCELL/DCELL
versions.
> r.mapcalc manual says:
> rand(a,b) random value x : a < x < b
>
> ... but that's only true if a,b are floats. For integer values it is more
> like: a <= x <= b-1
The manual page is wrong. The code (intentionally) calculates values x
such that "a <= x < b".
I've fixed this in CVS.
> My first attempt was:
> G63> r.mapcalc "rand.boolean = int( rand(0,1) + 0.5 )"
>
> but that made an all 0s map as rand(n,n+1) == n as mentioned above.
> (I didn't expect the cast to int)
"0" and "1" are integer literals, so the above will use the integer
version of rand().
> In hindsight I can get that method to work by passing rand(,) two floats:
>
> G63> r.mapcalc "rand.boolean = int( rand(0.0,1.0) + 0.5 )"
> G63> r.info -r rand.boolean
> min=0
> max=1
That's inefficient, but it will work.
Ultimately, if you want 0 or 1, use rand(0,2).
> Bug: r.mapcalc rand() not acting randomly. (in at least 5.4 - 6.3cvs)
> ===========================================
>
> #simple_xy location
> BOXSIZE=512
> export GRASS_WIDTH=$BOXSIZE
> export GRASS_HEIGHT=$BOXSIZE
> d.mon x0
> g.region n=$BOXSIZE s=0 w=0 e=$BOXSIZE res=1
>
> r.mapcalc "rand.boolean = rand(0,2)" # make a map of 0s and 1s
> d.rast rand.boolean
>
> screenshot: (1:1 region rows/cols to image rows/cols)
> http://bambi.otago.ac.nz/hamish/grass/bugs/rmapcalc_rand_int_band_63cvs.png
>
> vertical bands, the rows have a repeating pattern to them.
>
> weird-- in GRASS 6.0.2 each row is one cell shorter and the pattern
> drifts to the left (box is 1:1 so ends up in opp corner).
> In 5.4.0 it's the same as both 6.3cvs and 6.2.2, ie vertical bands.
> http://bambi.otago.ac.nz/hamish/grass/bugs/rmapcalc_rand_int_band_602.png
> http://bambi.otago.ac.nz/hamish/grass/bugs/rmapcalc_rand_int_band_540.png
>
>
> My first guess was that this arrived with the addition of $GRASS_RND_SEED
> and the seed was reset to the same value at the start of each row when it
> should really only be set once at the start of the module.
It is only set once at the start of the module.
> After short reflection that's not the case (it's not a smear) but there
> certainly is a repeating pattern. Aliasing effect? Resizing the xmon
> doesn't get rid of the pattern, it's in the cells. By setting the region
> to be one bigger than 512x512 the effect goes away:
>
> g.region n=$((1+ $BOXSIZE)) s=0 w=0 e=$((1 + $BOXSIZE)) res=1
> r.mapcalc "rand.boolean = rand(0,2)"
>
>
> eh? power of 2 issue?
Yep; it's an artifact of the PRNG.
If you want "very" random numbers, use a Mersenne Twister, but it's
quite a bit slower than an LCG.
> r.surf.random
> =============
>
> to back up a little, my first attempt to make the 0/1 map was with
> r.random, but that requires a base map and is for placing points.
> Wrong module.
>
> Next I tried r.mapcalc and ran into above issues.
>
> Next I tried r.surf.random. It creates a DCELL map with data in the range
> of two integers.
>
> One thing which struck me as odd is that r.surf.random creates a DCELL map
> from two integer command line options. If you create a map with floating
> points numbers, why not allow min and max to be any real number??
Probably overlooked when r.surf.random was extended from CELL to
DCELL.
> I've just now added an -i flag to r.surf.random to write out a CELL map.
> I made that be in the range of [min <= x <= max] for int, and for DCELL
> kept (min < x < max) [or whatever rand() gives]. As this doesn't match
> r.mapcalc behaviour (see above) I think we need to change either one or
> the other module for consistency's sake, then document the choice well.
>
> My first attempt to add the -i flag was like:
> *(row_out_C + col_count) = (CELL) floor(rand1(2742)*(max-min)+min +0.5);
>
> but that under-represents first and last bin.
> e.g. for min=0 max=10 the 0 and 10 bins only have half the cells of 1-9.
>
> G63> r.surf.random.MOD rsrand.i -i min=0 max=10
> G63> d.histogram rsrand.i
> G63> r.stats -c rsrand.i
> 100%
> 0 13032
> 1 26190
> 2 26142
> 3 26262
> 4 26336
> 5 26250
> 6 26171
> 7 26513
> 8 26262
> 9 25999
> 10 12987
>
> as the "1" bin draws from the range of 0.5-1.49999 and the 0 bin only
> has half that space (0.0-0.49999) to draw from.
>
> so I made the -i version as follows:
> *(row_out_C + col_count) = (CELL) (rand1(2742)*(max+1-min)+min);
>
> but that goes back to the question of should min=0, max=10 create 11 bins
> or 10?
10. 0 to 9, inclusive.
> For the integer case, I'd say include the end numbers; but that
> probably means changing r.mapcalc's rand(int,int) to be the same. ?????
r.mapcalc's behaviour is correct.
PRNGs invariably use start-closed/end-open intervals, i.e. a <= x < b.
More generally, if you partition the interval a-c into a-b and b-c,
you normally want b to be in one or the other but not both.
> As for Numerical Recipes code in the module (or not), it uses rand1()
> from the gmath library. lib/gmath/rand1.c
rand1() uses either rand() or drand48() from the standard C library.
There is no Numerical Recipes code involved here.
> How to cite that in the r.surf.random.html help page? I put:
> "It uses the random number generator drand48() or rand(), depending on
> the user's platform."
> but I'd much rather be specific and say those come from the GNU Project
> or as defined in POSIX 1.1 or C99 std, or libc6 or whatnot, but don't
> know the correct answer.
rand() is C89, drand48() is POSIX.
> as for drand48, the man page says:
> NOTES
> These functions are declared obsolete by SVID 3, which states
> that rand(3) should be used instead.
>
> So should we remove the "#if defined(HAVE_DRAND48)" parts from
> lib/gmath/rand1.c?
Probably.
ANSI C doesn't dictate the specific PRNG algorithm or the number of
bits, only that RAND_MAX is at least 32767. The end result is that
some rand() implementations are quite poor (I've even seen
implementations which take the *least* significant bits of an LCG,
which produces alternating even/odd values).
If lrand48() et al exist, they will always use a 48-bit state, which
puts a lower bound on the quality of the PRNG.
OTOH, if a system has lrand48(), it's likely that rand() will either
be an alias for lrand48() or something at least as good.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list