Hi Roger and list, thanks for the reply. I'll start at the beginning to clarify. I'm an archaeologist and I'm creating a model that predicts the locations of archaeological sites (there are various extant methods, but I'm developing a new one using a combination of techniques) given 'n' environmental variables that can be continuous or categorical. The model is structured on the idea that each cell in the output raster map will contain a probability that it is within a buffer around a site based on the definition of 'site' derived from the continuous and categorical data input by the analyst. I've taken the stance that the predictive problem can be viewed as one of classification in n-dimensions so that every point in the study area has n-dimensional coordinates in the data space. So, the question is, 'what is the probability that any given pixel in a raster map of arbitrary size is located within [buffer] distance of a site?'. <br>
<br>In order to derive the probability, I'm using an R package called 'np' by Tristen Hayeld and Jerey S. Racine, which can create an n-dimensional pdf estimate using a kernel that includes both continuous and categorical data types. I'm also weighting the probabilities based on distance since the definition of a site (based on statistics gathered from [buffer] distance around known site locations) can and will be different in different locations -- some sites are preferentially located on terraces, others at fluvial confluences, etc. Taking a cue from geographically weighted regression techniques, I'm weighting the probability that any given pixel is within [buffer] distance of a site using distance decay kernels. As a result, known sites nearer to a location of interest for which I want to predict the probability of finding a site at/near will have greater influence on the probability since it's more likely that proximal sites are similar in their environmental characteristics than disparate ones. The probabilities are, therefore, different for every pixel of interest since the distances between each pixel and each known site location will be different in different cases. So, this calculation needs to be done on a 'per-pixel' basis. Some day, I'll write the code so that it's convenient for parallel processing and make large maps possible (I'll be using region settings and masks to limit the size for now). In the meantime, I was hoping to find an elegant way to write the probabilities to a raster map in R. I'm developing a function that returns the probability, but so far the only way I've been able to write the data out and ensure that it's being written to the appropriate corresponding coordinate position in the grid is like this;<br>
<br><br>*using libraries sp, spgrass6<br>--------------<br>if (any(parseGlist(rast_list) == "MASK")) {<br> ##obey GRASS mask<br> modelRast <- readRAST6(c("MASK"))<br> ##after pulling the mask in, get spatial information for use in the output<br>
tmp_proj4string <- slot(modelRast,'proj4string')<br> numcells <- prod(as.vector(slot(slot(modelRast,"grid"),"cells.dim")))<br> slot(modelRast, "data") <- data.frame(index=c(seq(1,celldim,1)))<br>
##convert to dataframe for processing one cell at a time<br> modelRast_df <- data.frame(modelRast)<br> ##iterate over dataframe one cell at a time to calculate probability and write to appropriate df column and row<br>
for (i in modelRast_df$index) {<br> modelRast_df[index,'probability'] <- calcProb(modelRast_df[modelRast_df$index == 1,],vect_pts,buffer_df,distkernel,pdfkernel)<br> }<br> ##promote df to spatial grid<br>
modelRast <- as(SpatialPixelsDataFrame(modelRast_df[c('x','y')],modelRast_df), "SpatialGridDataFrame")<br> ##add spatial information back to the grid that was captured above<br>
slot(modelRast,"proj4string") <- tmp_proj4string<br> ##output GRASS raster<br> writeRAST6(modelRast,"pred_model",zcol="probability")<br> }<br>--------------<br><br>
If anyone has any ideas for cleaning this up -- like writing the information directly to the spatial grid without changing it to a dataframe and being certain that the new data is written into the proper row -- I would greatly appreciate it. I'll also think more about Roger's suggestion and try to implement it. Thanks,<br>
<br>Chris<br><br><br><div class="gmail_quote">On 11 November 2010 14:50, Roger Bivand <span dir="ltr"><<a href="mailto:Roger.Bivand@nhh.no">Roger.Bivand@nhh.no</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div><div></div><div class="h5">On Thu, 11 Nov 2010, Chris Carleton wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi List,<br>
<br>
I've imported a SpatialGridDataFrame (SGDF) from GRASS with spgrass, and now<br>
I'm trying to select only those entries with a certain value in a column.<br>
What is the appropriate syntax for selecting cells in a grid that have<br>
particular values? Here's what I have:<br>
<br>
----------<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
summary(modelRast)<br>
</blockquote>
Object of class SpatialGridDataFrame<br>
Coordinates:<br>
min max<br>
x 174812.5 382055.5<br>
y 1798172.6 1942292.8<br>
Is projected: TRUE<br>
proj4string :<br>
[+proj=utm +zone=16 +a=6378137 +rf=298.257223563 +no_defs<br>
+towgs84=0.000,0.000,0.000 +to_meter=1.0]<br>
Number of points: 2<br>
Grid attributes:<br>
cellcentre.offset cellsize cells.dim<br>
x 174820 15.00021 13816<br>
y 1798180 15.00002 9608<br>
Data attributes:<br>
Value 1 NA's<br>
1534000 131210128<br>
----------<br>
<br>
So, the SGDF obviously 'knows' which cells are NA and which are 1, but how<br>
do I select cells on the basis of a column value?<br>
</blockquote>
<br></div></div>
Firstly, this is a very large object if you are considering statistical analysis. In general, you treat Spatial*DataFrames like any data frame. To operate on it, you might choose to coerce it to a SpatialPixelsDataFrame, which drops all of the missing values cells, and retains only those within your apparent mask. Then, you'd do something like:<br>
<br>
modelRastSP <- as(modelRast, "SpatialPixelsDataFrame")<br>
# add a variable<br>
modelRastSP$newvar <- whatever<br>
<br>
The same "$" or "[[" operator works on a SpatialGridDataFrame. Are you asking because you need to interpolate to the raster within a mask? Then you could use coordinates(modelRastSP) to get the prediction points.<br>
<br>
If you can make your questions more specific, I think that others may be able to help,<br>
<br>
Roger<div><div></div><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
Also, is there a way to write values into a column in the SGDF without<br>
having to create a new df object and then combine them? For example, how do<br>
I:<br>
<br>
---pseudo code---<br>
for cell in sgdf {<br>
thing = calculate_something(cell)<br>
write2grid(cell, thing)<br>
}<br>
------------------------<br>
<br>
I haven't been able to work this out on my own or find any examples of this<br>
type of selection online (been searching a while... ) though I might be<br>
looking in the wrong places. Thanks,<br>
<br>
Chris<br>
<br>
</blockquote>
<br></div></div><font color="#888888">
-- <br>
Roger Bivand<br>
Economic Geography Section, Department of Economics, Norwegian School of<br>
Economics and Business Administration, Helleveien 30, N-5045 Bergen,<br>
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43<br>
e-mail: <a href="mailto:Roger.Bivand@nhh.no" target="_blank">Roger.Bivand@nhh.no</a><br>
<br>
<br>
</font></blockquote></div><br>