[GRASS-dev] adding lib_arraystats

Fri Feb 15 08:48:45 EST 2008

On Thu, 14 Feb 2008, Glynn Clements wrote:

>
> Roger Bivand wrote:
>
>>>> What would be particularly useful is if it's possible for GRASS to use
>>>> R functions on small amounts of data. E.g. r.series and r.resamp.stats
>>>> both compute aggregates over relatively small amounts of data.
>>>
>>> This is essentially what I had in mind when I first posted the suggestion--
>>> using R code on simple arrays of data.
>>
>> This is essentially what classInterval() needs:
>>
>> initiate the R backend and workspace;
>> put the vector of doubles into the workspace;
>> load the classInt package into the workspace;
>> run the classInterval function with the argument values;
>> collect the break values vector from the output object;
>> optionally continue to collect a vector of strings giving the RGB values;
>> optionally generate an empirical cumulative distribution function plot
>>   of the data showing the class intervals and chosen colours as PNG;
>> terminate the R backend;
>
> What r.series and r.resamp.stats need is:
>
> 	Initialise R
> 	For each cell:
> 		push a vector of doubles into R
> 		call the R function on the vector
> 		pull the result (a single double) from R
> 	Shut down R

Certainly feasible. Probably it would be reasonable to block up the single 
cell, to push a cube of doubles and pull a matrix of doubles (vector with 
three dimensions/two dimensions), but that is just a performance question 
of whether the push/pull operations are costly compared to the vectorised 
function within R operating on multiple cells. Are we in Python for this?

Roger

>
>>>> If it was practical to have e.g. "r.series method=R expression=...",
>>>> that would be much more useful than having to start R and load
>>>> potentially hundreds of rasters into memory.
>>>
>>> Right-- the "not-in-memory" features of GRASS are extremely valuble.
>>
>> Since R operations are vectorised, it might be possible to pass a block of
>> values (say k rows, p columns, where p is the number of input rasters) and do
>> apply() on it to get k rows back, but the blocking would be on the GRASS side.
>> But this would be for specialist things, I expect.
>
> This how any GRASS modules would want to use R. If you're transferring
> entire maps, you would be better off just coding the whole thing in R.
>
> GRASS modules inherently operate on chunks of data, either rows, or a
> sliding window of several rows, or a region of cells, etc.
>
> Processes which need to operate upon an entire map don't really need
> to "integrate" with R; they can just run an R program as a
> self-contained operation.
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no