[GRASS-dev] adding lib_arraystats

Roger Bivand Roger.Bivand at nhh.no
Wed Feb 13 12:39:15 EST 2008


Dylan Beaudette <dylan.beaudette <at> gmail.com> writes:

> 
> On Tuesday 12 February 2008, Glynn Clements wrote:
> > Dylan Beaudette wrote:
> > > On the other hand-- R has so many great plotting / analysis routines,
> > > it would be a shame to re-implement these in GRASS if they could be
> > > easily "plugged-into" via some kind of share library / SWIG
> > > interface.
> 

The Rpy route is feasible but in my experience not very portable; essentially
you push R scripts across the Python-R interface together with as little data as
possible, and get as little data as possible back. 

> Thanks for the follow-up Glynn.;
> 
> > Another concern is that it could discourage development of new
> > functionality in GRASS itself. In the worst case, GRASS could end up
> > of little use unless you are familiar with R, and your map fits into
> > memory.
> >
> > This is similar to my concerns regarding the use of high-level Python
> > libraries in the GUI suppressing development of more widely-applicable
> > display modules.
> >
> > External tools should be used to augment GRASS functionality, not to
> > replace it.
> 
> I agree completely-- just as long as we don't go overboard re-inventing the 
> wheel.

Yes, very sensible. With reference to the original question, I think in fact
that class interval finding could be done fairly readily using the classInterval
function in the classInt package, which provides lots of methods, and does not
require any spatial data, just data as such, and for rasters, decimated or
resampled data would be good enough. But using R would be just for people who
need it, and for people who are developing functions within GRASS itself who
need to test out methods implementations. The classInt package help pages are at:

http://cran.r-project.org/doc/packages/classInt.pdf

> 
> > That isn't to say that we shouldn't facilitate integration with tools
> > such as R. Certainly, I would prefer R integration to adding our own
> > half-baked statistics package. But we also need to avoid delegating
> > "core" functionality to external packages.
> >
> > What would be particularly useful is if it's possible for GRASS to use
> > R functions on small amounts of data. E.g. r.series and r.resamp.stats
> > both compute aggregates over relatively small amounts of data.
> 
> This is essentially what I had in mind when I first posted the suggestion-- 
> using R code on simple arrays of data.

This is essentially what classInterval() needs:

initiate the R backend and workspace;
put the vector of doubles into the workspace;
load the classInt package into the workspace;
run the classInterval function with the argument values;
collect the break values vector from the output object;
optionally continue to collect a vector of strings giving the RGB values;
optionally generate an empirical cumulative distribution function plot
  of the data showing the class intervals and chosen colours as PNG;
terminate the R backend;

This could be done as a shell script using temporary files (could be binary),
from Python on supported platforms, on Windows from Python via (D)COM, or using
the R shared object. None of them will be as fast as an in-GRASS solution,
although R backend startup times and package loading are very much faster now
than previously. 

> 
> > If it was practical to have e.g. "r.series method=R expression=...",
> > that would be much more useful than having to start R and load
> > potentially hundreds of rasters into memory.
> 
> Right-- the "not-in-memory" features of GRASS are extremely valuble.

Since R operations are vectorised, it might be possible to pass a block of
values (say k rows, p columns, where p is the number of input rasters) and do
apply() on it to get k rows back, but the blocking would be on the GRASS side.
But this would be for specialist things, I expect.

> 
> > r.resamp.stats only works on a single map, but as it's essentially a
> > down-sampling tool, it's not unreasonable to use it on very large
> > input maps.
> 
> I like this idea. Use of a trimmed mean or other such R specialty would be 
> very nice here.

Some of the specialites are rather sophisticated, but others are more
sophisticated than needed in GIS on a regular basis. It may be that the R
functionality is more use here in supporting the implementation of code within
GRASS by permitting checking that both end up with largely the same output for
the same input.

Roger

> 
> Dylan
> 






More information about the grass-dev mailing list