[GRASSLIST:382] Re: a huge matrix ?

Sat Jun 14 05:33:32 EDT 2003

I'm not sure that you found an answer to this yet, Ahmet - if you did, 
please ignore this suggestion:

1) use r.mapcalc to create an integer combined index for the combinations 
of your raster layers:

    tmp1 = mapa*10 + mapb
    tmp2 = tmp1*10 + mapc
    ....

or some such way of generating an integer index differentiating the 
different types. If there are raster values outside 0-9, maybe reclass 
first, or use 100 instead of 10 to left shift the index where there are 
10-99 discrete values.

2) In R, generate the same integer index, and make a data frame with the 
integer index in one column, and the probabilities in another.

3) Use the R/GRASS interface (or r.stats -1 and scan()) to move either the 
whole of your index raster, or more likely tiles made by changing the 
GRASS region, to R.

4) in R, use vectorised lookup to search for the probability value given 
the index value - match() is fast and powerful (I haven't checked this, 
but do use match() in other code - it will probably work much better on 
tiles than your big raster) and assign the probability values to a new 
vector.

5) move this vector back to GRASS as FCELL or DCELL. If you are tiling, 
finish off by patching the tiles back together.

6) My guess is that the tiling, import, matching, and export can all be 
done in a loop within R.

Rationale: the combined index avoids all the if statements - 11K if's is 
rather a lot! I did think about pushing it all out say to postgresql and 
using SQL (one table of RHS raster values, one table of lookups) but there 
were still seven fields on each side to index on, so it seemed at least 
worth trying to combine them automatically by left shifting (probably not 
the right word) and addition, which are cheap operations in r.mapcalc.

Please let me know how you tackled this problem, your solution will have 
general interest, and should be in the general GRASS knowledge base. I can 
expand on this off-list if it is of any use.

Roger

On Wed, 11 Jun 2003, Glynn Clements wrote:

> 
> orkun wrote:
> 
> > >>I have a huge matrix containing 7 column and 10500 rows.
> > >>Each columns stand for different maps. Each rows represent "unique"
> > >>combinations of catagories of maps and probability values
> > >>of each row.
> > >>Something like:
> > >>
> > >>              map1  map2 map3 ... prob
> > >>1   cat1       x     x    x        x
> > >>2   cat2       x     x    x        x
> > >>.     .
> > >>.     .
> > >>10500 .
> > >>
> > >>So I have 10500 if-clause to create a probability map.
> > >>I tried r.mapcalc (as from a text file) but grass gave "stack overflow" 
> > >>error.
> > >>    
> > >>
> > >
> > >I'm not surprised.
> > >
> > >  
> > >
> > >>Is there anyone making suggestions ?
> > >>    
> > >>
> > >
> > >You need to describe your problem in more detail; it isn't clear what
> > >you are trying to calculate.
> > >
> > >  
> > >
> > Thank you for your interest
> > 
> > You wanted to know the problem in more detail:
> > 
> > what I did:
> > 1- using r.stat command I produced a text file for statistical calculations
> > 2- using R statistical package I produced probability results of each 
> > map's for each categories.
> > it means 7 columns of maps 10500 rows plus probability of each 
> > combination.   Actually it is a contingency table. Any data in table 
> > (matrix) is resulting  from unique combination of maps and their
> > categories.
> > 3- I want to port these results to grass to create a  map (of landslide 
> > susceptibility map) . Firstly I thought I used r.mapcalc with piping 
> > from text file. But because text file contains a huge matrix that is 
> > made up of 10500 if statement (for same  number of  rows) process failed.
> > I attached a part of text file after modifying for r.mapcalc. you  see 
> > all ifs are unique.
> 
> OK, so: each line is a rule; for each pixel you scan the list of rules
> to find the first match, then output the associated result. Right?
> 
> The most straightforward approach (although not particularly
> efficient), would be to use an iterative approach, i.e.
> 
> 	n=0
> 	last='null()'
> 	while read a1 a2 a3 a4 a5 a6 a7 a8 ; do
> 		result=pr2.pass$n
> 		r.mapcalc "$result = if (f28geo5==$a1 && f28slpcat==$a2 && f28concavity==$a3 && n_facing==$a4 && road_bas==$a5 &&str_buf==$a6 && f28f4==$a7, $a8, $last)"
> 		last=$result
> 	done < rules.txt
> 
> where rules.txt looks like:
> 
> 	21 6 1 1 1 1 0 701.22
> 	 7 4 2 1 1 2 1 702.06
> 	...
> 
> Alternatively, you could combine this with your initial approach, only
> matching a limited number of rules on each pass, and using several
> passes.
> 
> However, I note that there seem to be very few possible values for
> most of the maps. In that case, a far more efficient approach would be
> to:
> 
> 1. Reclass each input map so that all of the significant categories
> are consecutively numbered from zero (0,1,2,3,...); any insignificant
> categories would be merged into the last category.
> 
> 2. Compute a composite map, using e.g.
> 
> 	composite = f28geo5 + 22 * (f28slpcat + 9 * (f28concavity + 3 * ...
> 
> where the numbers are the total number of categories in the preceding
> map.
> 
> 3. Assign the values as labels for the composite categories.
> 
> 4. Use "@map" in r.mapcalc to use the label instead of the category.
> 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no