[GRASS-dev] Compute mahalanobis distance using Scipy

Glynn Clements glynn at gclements.plus.com
Fri Jan 30 06:39:56 PST 2015


Paulo van Breugel wrote:

> I would like to compute a raster layer with for each raster cell the
> mahalanobis distance to the centre of the environmental space formed by all
> reference data points (raster cells). In R this can be done as explained
> here [1].
> 
> . I would like to do this using python only (no dependency on R). I came up
> with the following, which works, but is very slow. I guess this is because
> it loops over every raster cell to compute the mahalanobis distance? Any
> idea how this can be done faster (I am very new to python, so bear with me
> if I am making stupid mistakes)

The main speed-up will come from "inlining" distance.mahalanobis(),
which is essentially:

    delta = u - v
    m = np.dot(np.dot(delta, VI), delta)
    return np.sqrt(m)

np.dot(v, m) is equivalent to np.sum(v * m,axis=0), and np.dot(u,v) is
equivalent to np.sum(u * v), so the second line above is equivalent to

    m = np.sum(np.sum(delta[:,None] * VI * delta[None,:], axis=-1), axis=-1)

Thus, delta can be extended from a 1-D array to 3-D, changing the
result from a scalar to a 2-D array. The calculation of stat_mah then
becomes:

    VI = np.linalg.inv(covar)
    delta = dat_ref - stat_mean[:,None,None]
    m = np.sum(np.sum(delta[:,:,:,None] * delta[:,:,None,:] * VI[None,None,:,:],axis=-1),axis=-1)
    stat_mah = np.sqrt(m)

This moves the loops from Python into C, which usually results in a
significant speed increase.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list