[STATSGRASS] dependent variable

Thu Feb 27 04:39:33 EST 2003

Hi Ahmet,

Sorry I didn't get back to you yesterday.

On Thu, 27 Feb 2003, orkun wrote:

> Hello
> 
> I try to create landslide hazard map using grass and R.
> Here are the commands I used:
> /"r.stats -nNCc input=landslide_binary_map,geology_cat_map,slope_cat_map 
> fs=":" output=land"/
> 
> it outputs something like: 
> 0:13:2:3556
> 1:5:3:7086
> .
> .
> .
> 
OK. This is an interesting question, which I'll try to develop. r.stats 
gives you for each of the observed combinations of input map values, a 
count of the number of cells with that combination, in fact a tabulation. 
You can see this in spearfish:

GRASS:~ > r.stats -nNCc input=landuse,vegcover,soils.ph
r.stats:  100%
1 1 0 91
1 1 2 101
1 1 3 331
1 1 4 615
1 1 5 16
1 2 0 541
1 2 1 44
1 2 2 125
1 2 3 557
1 2 4 2564
1 2 5 61
1 3 0 10
....

In R:

> G <- gmeta()
> dt <- rast.get(G, c("landuse","vegcover","soils.ph"))
> str(dt)
List of 3
 $ landuse : num [1:302418] NA NA NA NA NA NA NA NA NA NA ...
 $ vegcover: num [1:302418] NA 2 2 2 6 6 6 6 6 6 ...
 $ soils.ph: num [1:302418] NA NA NA NA NA NA NA NA NA NA ...
> table(dt$landuse,dt$vegcover,dt$soils.ph)
, ,  = 0

     1   2  3  4  5   6
  1 91 541 10 15 11 107
  2  0  61  0  0  0  66
  3 10  89  0  0  0  65
  4  1 155  0  0  0   0
  5  0   0  0  0  0   0
  6  0   0  0  0  0   0
  7  0 145  0  0  6 260
  8 14  99  0  0  0  50

...

You can read the first line in r.stats as [1,1,0] with a count of 91,
[1,2.0] is 541, and so on. So, for each combination of factors, you have a
count, but of coursr only one count. This looks to me much more like
log-linear modelling than multiple regression, or possibly logit or probit
in glm().

A further issue is that the factor outcomes (landslip_binary) are likely 
to be autocorrelated, with clusters of 1's on the map, possibly related to 
the slope and geology categories which are also autocorrelated. In 
log-linear models (or chi-squared on a simple table) this isn't settled 
in the literature, because by using finer nsres and ewres - making more 
cells, you can make a bigger "sample" and make your result significant. 
There is a loglin() function in R, but I would think it sensible to look 
around first to see whether this would be sensible for your table data. 
Maybe just tabularising them will throw up something?

Best wishes,

Roger 

> I imported land text file to R.
> Now I want to use multiple regression model.
> The command I used:
> /"r<-lm(V4~as.factor(V2)+as.factor(V3),data=dt)"
> /
> you know V4 is landslide pixel count. I used it as
> dependent variable as you see. But I hesitate whether it is correct 
> approach. Is it true ? and What do you suggest ?
> 
> 
> kind regards
> 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no