[GRASS-dev] Compute mahalanobis distance using Scipy

Glynn Clements glynn at gclements.plus.com
Tue Feb 17 09:58:07 PST 2015


Paulo van Breugel wrote:

> > It may be possible to avoid this issue using numpy.tensordot, e.g.
> >
> > mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)
> >
> > I don't know for sure whether this actually uses less memory. It could
> > do, as it rearranges the matrices so that the sum-of-products is
> > calculated using np.dot, which is a built-in function.
> >
> 
> I did some quick testing, it seems to use slightly less memory. The dask
> solution seems promising, with the disadvantage that it is not widely
> available / still experimental according to the website.
> 
> For a quick solution, what about using r.tile to split the input data in
> tiles and compute the mahalanobis distance per tile. Or would that come
> with too much overhead or other clear disadvantages?

I'd suggest just splitting the processing into chunks in the Python
script.

The original version operated one cell at a time, processing the
entire map at once uses too much memory, but you could operate on a
row (or N rows) at a time.

IOW, rather than

	mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)

use e.g.:

	for row in xrange(0,rows,step):
	    end = min(rows,row+step)
	    dchunk = delta[:,row:end]
	    mahdist[row:end] = np.sum(np.tensordot(VI, dchunk, (1,0)) * dchunk,axis=0)

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list