[GRASS-dev] r.regression.line crash in awk (awk debugging)

Markus Neteler neteler at osgeo.org
Fri Aug 7 03:33:16 EDT 2009


On Fri, Aug 7, 2009 at 9:13 AM, Markus Neteler<neteler at osgeo.org> wrote:
> On Fri, Aug 7, 2009 at 3:05 AM, Hamish<hamish_b at yahoo.com> wrote:
>> Markus Neteler wrote:
>>> I wanted to run a linear regression on a DEM and a temperature
>>> map but got an error:
>>> awk: cmd. line:2: (FILENAME=- FNR=3) fatal: division by zero attempted
>>>
>>> I have reduced everything to this part (added --lint):
>>>
>>> #!/bin/sh
>>>
>>> unset LC_ALL
>>> LC_NUMERIC=C
>>> export LC_NUMERIC
>>>
>>> echo "345.551054 1 1
>>> 364.229489 1 1
>>> 382.907925 1 3" | awk --lint '{tot += $3;sumX +=$1 * $3;
>>> sumsqX
>>> +=$1*$1*$3;sumY +=$2 * $3; sumsqY +=$2*$2*$3; sumXY
>>> +=$1*$2*$3;}
>>> END {B=(sumXY - sumX*sumY/tot)/(sumsqX - sumX*sumX/tot);
>>> R= (sumXY - sumX*sumY/tot)/((sumsqX - sumX^2/tot)*(sumsqY -
>>> sumY^2/tot))^0.5;
>>> mediaX=sumX/tot;sumsqX=sumsqX/tot;varX=sumsqX-(mediaX^2);sdX=varX^0.5;
>>> mediaY=sumY/tot;sumsqY=sumsqY/tot;varY=sumsqY-(mediaY^2);sdY=varY^0.5;
>>> A=mediaY - B*mediaX; F= R^2/(1-R^2/tot-2);
>>> print A, B, R, tot, F, mediaX, sdX, mediaY, sdY}'
>>>
>>> Running this, I get:
>>> awk: (FILENAME=- FNR=1) warning: reference to uninitialized
>>> variable `tot'
>> ....
>>
>>
>> it starts with "tot += $3" [aka "tot = tot + $3"], but tot is unused at
>> that point so undefined. (maybe awk can be trusted to always init new
>> vars to 0? no idea)
>>
>> adding a 'BEGIN { tot = 0; sumX = 0; ... }' section at the start of the
>> awk script might help.
>
> Tried this:
>
> echo "345.551054 1 1
> 364.229489 1 1
> 382.907925 1 3" | awk --lint -F " " 'BEGIN { tot = 0; sumX = 0; sumsqX
> = 0; sumY = 0; sumsqY = 0; sumXY = 0;}
>    {tot += $3; sumX +=$1*$3; sumsqX +=$1*$1*$3; sumY +=$2*$3; sumsqY
> +=$2*$2*$3; sumXY +=$1*$2*$3;}
> END { B     = (sumXY - sumX*sumY/tot)/(sumsqX - sumX*sumX/tot);
>      R     = (sumXY - sumX*sumY/tot)/((sumsqX - sumX^2/tot)*(sumsqY -
> sumY^2/tot))^0.5;
>      mediaX= sumX/tot; sumsqX=sumsqX/tot; varX=sumsqX-(mediaX^2); sdX=varX^0.5;
>      mediaY= sumY/tot; sumsqY=sumsqY/tot; varY=sumsqY-(mediaY^2); sdY=varY^0.5;
>      A     = mediaY - B*mediaX; F= R^2/(1-R^2/tot-2);
>      print A, B, R, tot, F, mediaX, sdX, mediaY, sdY}'
>
> but
> awk: cmd. line:3: (FILENAME=- FNR=3) fatal: division by zero attempted

But this works.

echo "345.551054 1 1
364.229489 1 1
382.907925 1 3" | awk --lint -F " " 'BEGIN { tot = 0; sumX = 0; sumsqX
= 0; sumY = 0; sumsqY = 0; sumXY = 0;}
    {tot += $3; sumX +=$1*$3; sumsqX +=$1*$1*$3; sumY +=$2*$3; sumsqY
+=$2*$2*$3; sumXY +=$1*$2*$3;}
END {print tot, sumX, sumsqX, sumY, sumsqY, sumXY}'

5 1858.5 691924 5 5 1858.5

If I throw it in R:

> tot=5
> sumX= 1858.5
> sumsqX=691924
> sumY= 5
> sumsqY= 5
> sumXY= 1858.5
>
> B     = (sumXY - sumX*sumY/tot)/(sumsqX - sumX*sumX/tot);
> R     = (sumXY - sumX*sumY/tot)/((sumsqX - sumX^2/tot)*(sumsqY - sumY^2/tot))^0.5;
> mediaX= sumX/tot; sumsqX=sumsqX/tot; varX=sumsqX-(mediaX^2); sdX=((varX^2)^0.5)^0.5;
> mediaY= sumY/tot; sumsqY=sumsqY/tot; varY=sumsqY-(mediaY^2); sdY=((varY^2)^0.5)^0.5;
> A     = mediaY - B*mediaX; F= R^2/(1-R^2/tot-2);
>
> B; R; tot; F; mediaX; sdX ;mediaY; sdY
[1] 0
[1] NaN
[1] 5
[1] NaN
[1] 371.7
[1] 14.96362
[1] 1
[1] 0

You see: R = NaN.

But then I rerun the R step:
> R     = (sumXY - sumX*sumY/tot)/((sumsqX - sumX^2/tot)*(sumsqY - sumY^2/tot))^0.5;
> R
[1] 0

MAGIC?! I guess I am overlooking something here...

Markus


More information about the grass-dev mailing list