NAME

r.dst.bpa - Calculates basic probability assignments for raster evidence maps.
(GRASS Raster Program)

SYNOPSIS

r.dst.bpa [-q] raster=name [sites=name] [raster_uncertainty=value] [sites_uncertainty=value] output=name [logfile=name] [cachesize=value]

This program cannot be used interactively.

DESCRIPTION

This program calculates Basic Probability Assignments (BPA) for use in a Dempster-Shafer Theory (DST) predictive model. If you have need for a flexible spatial predictive modelling framework but do not know what DST is, please refer to the manual page of the m.dst.combine program for an introduction.

A BPA is the basic quantification of evidence in the DST. It consists of a value mn in the range 0 to 1 for each hypothesis 1..n in the Frame of Discernment (FOD). The restriction is that m1..n must sum to 1.

In the case of evidence represented as a GRASS raster map, a BPA has to be calculated for each cell within the current region. We will consider the raster map as source of evidence and each individual cell as one piece of evidence. In order to calculate the BPA for each individual cell we need to know at what values the evidence supports or refutes a given hypothesis. The relationship between cell values and degree of support/refutal is modelled by the BPA function. It can be linear and monotonous but in principle it can also be arbitrarily complex. To find the BPA function and thus compute the BPAs for each piece of evidence, this program uses a simple approximation scheme based on empirical data supplied by the user.

Let us look at a simple example: we want to predict the habitat locations of a certain type of flower that grows high up in the mountains at an altitude where plant-eating animals do not often show up. The FOD {H1,H2,H3} is formed by two simple hypotheses H1="flower present" and H2="no flower present" (plus H3="H1 OR H2"; see manual page of m.dst.combine for details). At first glance, the situation looks very simple: the higher the terrain, the better the chance to find that kind of flower (i.e. H1 closer to "1"). However, even in this simple case, an adequate BPA function would be much more complex. Imagine a plot of H1 vs. "height":
  1. Starting at the lowest height, there would first be a flat section that represents the height range at which none of the flowers grow.
  2. This would be succeeded by a section with gradually steeper inclination as the height approaches a point which plant-eaters find uncomfortable.
  3. Above the point where the latter do definitely not go, there would be a very steep incline...
  4. ...followed by another plateau which represents the flower's optimal habitat.
  5. Beyond a certain height however, it gets too cold and the BPA function for our flower will again show a gradual decline.

BPA chart

Instead of trying to model the BPA function exactly, it is often sufficient and much easier to use a stepwise approximation function. Such a function can be automatically derived by r.dst.bpa if the user supplies two things:
  1. a site map S with the locations of known habitats;
  2. a raster map M that represents the evidence (e.g. height).
The raster evidence map must be fully categorised, i.e. each cell must represent a category (e.g. 1='0 to 5 m', 2='5 to 10 m',...) or be NULL (no data). You can use the r.categorise program to quickly create such a map from any GRASS raster map (both floating point or integer cells).
For each category C in the input evidence map M, r.dst.bpa compares the overall cell percentage of C in M (FC:M) with the percentage of C cells that contain a site from S (FC:S).
The simple assumption is that if FC:M > FC:S then the evidence in M and S supports the hypothesis "no site (flower) present" for all cells in M and if FC:M < FC:S it supports 'site present'.

To fine-tune the BPA function and in order to compensate for uncertain and low quality data, the user may also choose to transfer any percentage of the evidence in S or M to the uncertainty hypothesis "site OR no site" (see description of program parameters below).

Often, it will not be obvious what category ranges to choose (e.g. 1, 5 or 10 m intervals?). If there are too many categories in M, r.dst.bpa will produce an overfit function and a model based on such BPAs will not be able to generalise well, i.e. its predictions are very likely to be wrong. On the other hand, if there are very few categories, the program will not be able to find significant evidence that supports either hypothesis and the predictions will be useless.

Another option is to use categories with individual ranges. To stay with the flower example above, we could first use a priori knowledge about the region's natural zones to create a more significant categorisation manually (using r.support):

	Cat.	Range		Label
	1	0-1000		plains and valleys
	2	1000-1500	hills
	3	1500-1700	mountain area
	4	1700-2500	high mountain area (few mammals)
	5	2500-5000	highest mountain area (cold)

In this way, one can approximate arbitrarily complex BPA functions. The output maps will automatically be suffixed 'SITE', 'NOSITE' and 'SITE_NOSITE', for each respective hypothesis they support.

Once several sources of evidences have been turned into BPA maps, they can be registered in a DST knowledge base file using m.dst.update and combined to produce the final predictive map(s) using m.dst.combine.

Flags

-q
Quiet operation: do not display progress on screen.

Parameters

raster=name
Name of GRASS raster evidence map.
sites=name
Name of GRASS sites map with locations of known sites.
raster_uncertainty=name,value
Percentage of evidence mass in raster map to shift to 'uncertainty' hypothesis. This is best determined by calculating several models with different settings and comparing results. If you use a random sample of sites to calculate the model, you can check against the full set of sites afterwards. If too many sites fall into 'no site' areas, you should increase this value (or 'sites_uncertainty' if you trust the raster map data more than the site data; often, it is easiest to increase both).
sites_uncertainty=name,value
Percentage of evidence mass in sites map to shift to 'uncertainty' hypothesis.
output=name
Output raster maps basename. The default is to use the name of the input raster map and prefix it with 'bpa.'. Also, a suffix '.SITE', '.NOSITE' or '.SITE_NOSITE' will be appended.
cachesize=value
On systems with lots of memory, setting cachesize to the number of rows in the current region can speed up operations, because the input raster map can be kept in memory. The default of '-1' lets the program figure out a good size for the cache itself.

Notes

This program was developed as part of the GRASS 5 DST Predictive Modelling Toolkit.

SEE ALSO

m.dst.combine
m.dst.update
r.categorise
r.support

AUTHOR

Benjamin Ducke,
University of Bamberg, Germany
Heritage Management Brandenburg, Germany
Last changed: $Date: 2004/08/20 15:44:01 $