[GRASS-dev] Compute mahalanobis distance using Scipy
Glynn Clements
glynn at gclements.plus.com
Tue Feb 17 09:58:07 PST 2015
Paulo van Breugel wrote:
> > It may be possible to avoid this issue using numpy.tensordot, e.g.
> >
> > mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)
> >
> > I don't know for sure whether this actually uses less memory. It could
> > do, as it rearranges the matrices so that the sum-of-products is
> > calculated using np.dot, which is a built-in function.
> >
>
> I did some quick testing, it seems to use slightly less memory. The dask
> solution seems promising, with the disadvantage that it is not widely
> available / still experimental according to the website.
>
> For a quick solution, what about using r.tile to split the input data in
> tiles and compute the mahalanobis distance per tile. Or would that come
> with too much overhead or other clear disadvantages?
I'd suggest just splitting the processing into chunks in the Python
script.
The original version operated one cell at a time, processing the
entire map at once uses too much memory, but you could operate on a
row (or N rows) at a time.
IOW, rather than
mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)
use e.g.:
for row in xrange(0,rows,step):
end = min(rows,row+step)
dchunk = delta[:,row:end]
mahdist[row:end] = np.sum(np.tensordot(VI, dchunk, (1,0)) * dchunk,axis=0)
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list