<div dir="ltr"><br><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 30, 2015 at 3:39 PM, Glynn Clements <span dir="ltr"><<a href="mailto:glynn@gclements.plus.com" target="_blank">glynn@gclements.plus.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=""><br>

Paulo van Breugel wrote:<br>

<br>

> I would like to compute a raster layer with for each raster cell the<br>

> mahalanobis distance to the centre of the environmental space formed by all<br>

> reference data points (raster cells). In R this can be done as explained<br>

> here [1].<br>

><br>

> . I would like to do this using python only (no dependency on R). I came up<br>

> with the following, which works, but is very slow. I guess this is because<br>

> it loops over every raster cell to compute the mahalanobis distance? Any<br>

> idea how this can be done faster (I am very new to python, so bear with me<br>

> if I am making stupid mistakes)<br>

<br>

</span>The main speed-up will come from "inlining" distance.mahalanobis(),<br>

which is essentially:<br>

<br>

    delta = u - v<br>

    m = np.dot(np.dot(delta, VI), delta)<br>

    return np.sqrt(m)<br>

<br>

np.dot(v, m) is equivalent to np.sum(v * m,axis=0), and np.dot(u,v) is<br>

equivalent to np.sum(u * v), so the second line above is equivalent to<br>

<br>

    m = np.sum(np.sum(delta[:,None] * VI * delta[None,:], axis=-1), axis=-1)<br></blockquote><div><br><br></div><div>Smart! I can follow the logic, but I am not sure how to solve the problem below: <br><br>   Traceback (most recent call last):<br>    File "<stdin>", line 1, in <module><br>    ValueError: operands could not be broadcast together with shapes (3,1,77,78) (3,3)<br><br></div><div>Which refers to the different dimensions of delta and VI? <br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Thus, delta can be extended from a 1-D array to 3-D, changing the<br>

result from a scalar to a 2-D array. The calculation of stat_mah then<br>

becomes:<br>

<br>

    VI = np.linalg.inv(covar)<br>

    delta = dat_ref - stat_mean[:,None,None]<br>

    m = np.sum(np.sum(delta[:,:,:,None] * delta[:,:,None,:] * VI[None,None,:,:],axis=-1),axis=-1)<br>

    stat_mah = np.sqrt(m)<br>

<br>

This moves the loops from Python into C, which usually results in a<br>

significant speed increase. <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span class=""><font color="#888888"><br>

--<br>

Glynn Clements <<a href="mailto:glynn@gclements.plus.com">glynn@gclements.plus.com</a>><br>

</font></span></blockquote></div><br></div></div></div>