[GRASS-user] P value for slope from r.series

Markus Metz markus.metz.giswork at gmail.com
Wed Oct 18 23:51:19 PDT 2017


On Thu, Oct 19, 2017 at 2:31 AM, Daniel Victoria <daniel.victoria at gmail.com>
wrote:
>
> Hi Markus M,
>
> Thanks for your input. But one thing is still confusing me. From what I
understood, the multiple comparison problem would arise if I calculated one
p-value for all the regressions in a computational region. Say I have a 100
x 100 raster, chances are one of those 10,000 pixels would yield me a
significant regression. But in my case, I'm calculating a p-value raster
that is, each pixel has it's own p-value and I'm interested in the slopes
that have a significant trend (p <= 0.05). Thus each pixel regression is
(sort of) independent.

The p-values for each pixel are not really independent because they have
been calculated for the same raster series. For example, if you generate a
series of random maps and calculate p-values for each pixel of this series,
by pure chance some p-values will be very small. This is the multiple
comparison problem.

Markus M

>
> In essence. If I generate a raster with regression slope and p-value, and
I mask out the areas with high p (above 5%), the slope values in the
remaining regions would be significant correct? What you are saying is that
I might be overestimating the area with significant slope?
>
> Cheers and thanks
> Daniel
>
> On Wed, Oct 18, 2017 at 6:06 PM Markus Metz <markus.metz.giswork at gmail.com>
wrote:
>>
>>
>>
>> On Wed, Oct 18, 2017 at 7:19 PM, Daniel Victoria <
daniel.victoria at gmail.com> wrote:
>> >
>> > I just read on the p-value regression ticket a comment from Markus
Metz [1]. If I understood correctly, he mentions that the chances of
getting small p-values at random is high and we should do a correction. But
this would result in non-significant p-values. He concludes that it would
be more "appropriate to make prior assumptions about slope, intercept, and
effect size, then judge the results according to these prior assumptions".
>> >
>> > Does this means that I should not rely on the p-value obtained?
>>
>> Yes and no. The p-value needs to be interpreted correctly. Commonly used
thresholds are alpha = 0.05 and alpha = 0.01. That means if p <= alpha, the
result is statistically significant. Problems occur if you repeat the test
with the same dataset several times:
>> https://en.wikipedia.org/wiki/Multiple_comparisons_problem
>>
>> In these cases, alpha needs to be corrected in order to decide if a
p-value is significant or not. Regarding r.series, millions of repeated
tests might be performed (one for each cell in the current computational
region). Any standard correction method would thus render pretty much all
p-values non-significant. Instead, Bayesian statistics might be a solution.
>>
>> Markus M
>>
>> >
>> > Where can I find more information about this? Some colleagues and I
are in the process of finishing a paper that uses applies a regression to
annual NDVI data and right now, we are discussing if we should (or not)
consider the p-values obtained.
>> >
>> > Thanks and sorry if this is a bit of topic
>> >
>> > Cheers
>> > Daniel
>> >
>> > [1] https://trac.osgeo.org/grass/ticket/2376#comment:3
>> >
>> >
>> > On Mon, Oct 16, 2017 at 2:12 PM Daniel Victoria <
daniel.victoria at gmail.com> wrote:
>> >>
>> >> Replying to self and in case helps anyone.
>> >>
>> >> Solved it by using R and the raster package. Here is a Stackoverflow
post about it
>> >>
>> >>
https://stackoverflow.com/questions/20262999/how-to-output-regression-summarye-g-p-value-and-coeff-into-a-rasterbrick
>> >>
>> >> Cheers
>> >> Daniel
>> >>
>> >> On Wed, Oct 11, 2017 at 10:44 AM Daniel Victoria <
daniel.victoria at gmail.com> wrote:
>> >>>
>> >>> OK, dumb question since I'm a bit (or very) bad at stats.
>> >>>
>> >>> I'm calculating the slope from a series of rasters using r.series. I
see that I can also get the t-value and the coefficient of determination.
Is there a way to get the p-value for the regression?
>> >>>
>> >>> I've seen that this question has been asked before (in 2012) [1] and
it ended with the addition of the t-value calculation in r.series. But I
failed to see how the p-value can be obtained.
>> >>>
>> >>> I also found this ticket [2], related to the p-value question.
>> >>>
>> >>> Thanks
>> >>> Daniel
>> >>>
>> >>> [1] -
http://osgeo-org.1560.x6.nabble.com/Calculate-p-value-for-regression-slope-in-r-series-td5014228.html
>> >>>
>> >>> [2]  https://trac.osgeo.org/grass/ticket/2376
>> >>>
>> >
>> > _______________________________________________
>> > grass-user mailing list
>> > grass-user at lists.osgeo.org
>> > https://lists.osgeo.org/mailman/listinfo/grass-user
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20171019/99cad9cd/attachment-0001.html>


More information about the grass-user mailing list