[GRASS5] Re: s.kcv in 5.4, v.kcv in 5.7?

Mon Oct 18 15:07:43 EDT 2004

On Mon, 18 Oct 2004, Radim Blazek wrote:

> On Friday 15 October 2004 21:46, Helena Mitasova wrote:
> > Radim, would it be possible to port it to 5.7 as v.kcv? (I noticed that
> > some
> > of the other useful site programs such as v.random are already there.
> 
> I have submitted v.kcv; random partition is written to a new column
> (table must exist, no check, should be added).
> It must be terribly slow for larger data sets because of the algorithm
> used (from s.kcv): random point -> nearest point, i.e. O(n^2)

This was what I noticed in July 2001 replying (off-line) to Anne 
Holopainen (http://grass.itc.it/pipermail/grass5/2001-July/002801.html):

"I've been reading the s.kcv code and manual page. As I understand it, it
takes two arguments, k, the number of test/trials required, and sites, the
complete list of sites available, numbering say nsites in total. s.kcv
then creates k partitions, some ceiling(nsites/k) some floor(nsites/k)
large, such that the sum of the counts of members of the k test sets is
nsites. No site appears in more than one test set. The trial sets seem to
contain all the sites that are not test set members, such that the sum of
numbers of members of trial sets and test sets is always nsites.

The test sites are sampled from the sites available without replacement
using a rather odd mechanism. A point is chosen in the window region,
easting and northing at random, and the nearest site not yet assigned to a
test set is deemed to be chosen to join the test set being constructed. I
don't know why this is done in this way. Darrell actually comments that
the spatial process should not necessarily be random in space - if the
points themselves are not. Further, the region maybe ought to be defined
other than the current window.

I don't see why space is brought into this - do you have a view? Anyway, 
if space isn't important, sample() can be used.

s.kcv <- function(k, sites) {
        k <- as.integer(k)
        if(k < 2) stop("k less than two")
        if(!is.points(sites)) 
                stop ("sites not a splancs point object")
        nsites <- npts(sites)
        if(nsites < k) stop("More partitions than sites")
        idx <- 1:nsites
        p <- integer(k)
        p <- rep(floor(nsites/k), k)
        mod.p <- nsites %% k
        if (mod.p > 0) p[1:mod.p] <- ceiling(nsites/k)
        test <- vector("list", k)
        trial <- vector("list", k)
        for (i in 1:(k-1)) {
                test[[i]] <- sort(sample(idx, p[i], 
                        replace=FALSE))
                trial[[i]] <- which(!(1:nsites %in% test[[i]]))
                idx <- idx[which(!(idx %in% test[[i]]))]
        }
        test[[k]] <- idx
        trial[[k]] <- which(!(1:nsites %in% idx))
        res <- list(k=k, sites=sites, p=p, test=test, 
                trial=trial)
}"

As I wrote, I have no idea why Darrell sampled in space and chose the 
nearest, really. It could be done with a quadtree structure, but why?

Roger

PS. Anne got her work done three years ago using the R bridge.

> 
> Radim
> 
> _______________________________________________
> grass5 mailing list
> grass5 at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grass5
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no