[GRASSLIST:4086] Re: normal distribution

Tue Jul 16 12:43:34 EDT 2002

Bill,
Thanks very much for your comments. I am talking here about
photogrammetrically extracted DEMs (black and white air-photos at 1:40.000
roughly). There is an issue relating to detecting false fixes (i.e.
erroneous elevations) in relation to the automatic extraction algorithm.
Mike Gooch at Loughborough university explored the problem a few years ago.

http://www.geovista.psu.edu/sites/geocomp99/Gc99/050/abs99-050.htm. This is
also published in Computers and Geosciences.

He found that only some pixels were sensitive to changes of extraction
parameters. Those pixels are algorithm artefacts but they also turn out to
be the least accurate elevations when compared to independant elevation
controls. This finding is valid for various photogrammetric software. He
devised an algorithm to flag inaccurate pixels by comparing two DEM of the
same area extracted with two different parameter sets (this might be a
paired statistical problem by the way, how does one treat those?). His
so-called Failure Warning Model removes pixels whose elevation was
interpolated (because void of elevation information, this is given as an
output by Orthomax, the photogrammetric software) and also those whose
elevation difference is larger than a given threshold. Unfortunately, Gooch
does not suggest any guidelines to set the threshold.

I was exploring different rules to remove only obvious blunders (largest
elevation differences). A statistical filter is self-adaptive and responds
to a known mathematical behaviour. Your intuitive reply concerning the
random behaviour of the differences is a real concern that may compromise
the conclusion.
I am intending to interpolate the remaining reliable elevations with
s.surf.rst. With this method, there is even a possibility to calibrate the
smoothing attribute according to observed elevation differences (I have not
tried this yet, sounds time consuming). Here we are talking of about 2 m of
Z difference which is in a way a measure of the "noise" of the
photogrammetric process. The number of compared pixels is of the order of 4
millions.

Do you reckon I am totally wasting my time with this issue? Is there a
clever obvious solution that would do the trick better?

Again, thanks for discussing this question

Thomas
----- Original Message -----
From: "Quantitative Decisions" <whuber at quantdec.com>
To: "Thomas Dewez" <thomas.dewez at brunel.ac.uk>
Cc: "GRASS" <GRASSLIST at baylor.edu>
Sent: Tuesday, July 16, 2002 4:58 PM
Subject: Re: [GRASSLIST:4082] normal distribution

> At 03:01 PM 7/16/02 +0100, Thomas Dewez wrote:
> >I created a difference image between two DEM of the same are and would
like
> >to filter out only the outliers. ...
> >
> >The difference image is standardized ((obs-mean)/stdev) and I intend to
> >reject any score larger than +-1.96. How could I test that the difference
> >image is truly normal so that the threshold is meaningful? Do you reckon
> >this is a valuable way to proceed to reject values? It is nicely context
> >sensitive but have I missed something?
>
> This is difficult to answer because the most important part of the
> question, your objective, is unstated.  What follows therefore is a set of
> general remarks.
>
> The +-1.96 threshold will reject approximately 5% of all difference,
> assuming they are Normally distributed, *regardless* of the cause of any
> differences between the two DEMs.  Such a Procrustean solution is unlikely
> to be useful or even relevant.  It will poke too many holes in your
images.
>
> (1)     Useful.  This means you should be rejecting true "outliers,"
> whatever they are.  If you are assuming the difference in DEMs is a
> stationary multigaussian random function (representing "noise" or "error"
> or what have you), then an outlier would be any difference not consistent
> with that model.  Assuming the images are fairly large, say with N pairs
of
> matched pixels, then you should not use +-1.96 but instead use something
> around the 1/(2N) and 100 - 1/(2N) percentage points of the standard Norma
l
> distribution.  Indeed, you can view this as an approximation of a
> (99%-confidence) prediction interval; see Hahn & Meeker, Statistical
> Intervals, p. 62 (Wiley, 1991).  For typical DEMs (hundreds to millions of
> points), these percentage points will be in the 4-7 range, considerably
> larger than 1.96.  Furthermore, for a more robust approach, consider
> estimating the standard deviation based on middle percentiles of the
> differences, such as the interquartile range, and using that to compute
the
> scores.
>
> (2)     Relevant.  You could be testing extremely small differences.
Maybe
> they don't matter?  You could instead identify differences that are of a
> size that matters to your application and forget about finding statistical
> outliers.  At least be sure to evaluate all the differences in the context
> of the elevation accuracy expected from each DEM.
>
> As to the second part of your question, a Normal probability plot would be
> an excellent diagnostic test.  You don't want to apply a standard test
> (Kolmogorov-Smirnov or Shapiro-Wilks, Anderson-Darling) because it will be
> so powerful (due to the numerous data) that it's sure to reject the
> hypothesis of Normality no matter what.  If your software won't handle a
> probability plot with zillions of points, then sample the differences
> either randomly or systematically (on a grid) and plot the sample.  A
> sample size of a thousand or so should be fine.  But it would be nice to
> use all the data, because that will highlight the nature of any truly
> outlying data (which might not be picked up in a subsample of the pixels).
>
> I would say this approach is not very "context sensitive," at least not if
> you mean spatial context, because it ignores location information
> altogether.  You are likely to discover that the differences in DEMs
> reflect artifacts of their construction (such as interpolation from
contour
> lines) as much or more than they reflect true changes in ground
> elevation.  Be prepared not only for strong non-Normality, but also for
> differences that have strong spatial patterns.
>
> --Bill Huber
> Quantitative Decisions
> www.quantdec.com (contains pages on environmental statistics, including
> software and aids for probability plotting and prediction limits)
>
>
>