[GRASS-dev] Compute mahalanobis distance using Scipy

Glynn Clements glynn at gclements.plus.com
Fri Feb 13 19:38:13 PST 2015


Paulo van Breugel wrote:

> >> With memmap you still have a limits of 2Gb I guess, you should try: dask
> >>
> >
> Just reading the memmap manual page, where it reads: "Memory-mapped arrays
> use the Python memory-map object which (prior to Python 2.5) does not allow
> files to be larger than a certain size depending on the platform. This size
> is always < 2GB even on 64-bit systems.". Which is unclear to me; I am not
> sure if that means that this limit is different or does not apply when on
> Python 2.5 or newer (what is the minimum python version for GRASS?)

Even if Python itself no longer imposes a 2 GiB limit, you would
probably need to be using a 64-bit platform, the system would need
enough memory for the operation, and it would need to allow the
process to use that much memory.

The problem here is that

mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,axis=0)

expands both delta and VI to 4-D arrays then calculates their product
element-wise, then sums them over two axes. The intermediate array of
products could potentially be very large.

It may be possible to avoid this issue using numpy.tensordot, e.g.

mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)

I don't know for sure whether this actually uses less memory. It could
do, as it rearranges the matrices so that the sum-of-products is
calculated using np.dot, which is a built-in function.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list