[GRASS-dev] Compute mahalanobis distance using Scipy
Glynn Clements
glynn at gclements.plus.com
Fri Feb 13 19:38:13 PST 2015
Paulo van Breugel wrote:
> >> With memmap you still have a limits of 2Gb I guess, you should try: dask
> >>
> >
> Just reading the memmap manual page, where it reads: "Memory-mapped arrays
> use the Python memory-map object which (prior to Python 2.5) does not allow
> files to be larger than a certain size depending on the platform. This size
> is always < 2GB even on 64-bit systems.". Which is unclear to me; I am not
> sure if that means that this limit is different or does not apply when on
> Python 2.5 or newer (what is the minimum python version for GRASS?)
Even if Python itself no longer imposes a 2 GiB limit, you would
probably need to be using a 64-bit platform, the system would need
enough memory for the operation, and it would need to allow the
process to use that much memory.
The problem here is that
mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,axis=0)
expands both delta and VI to 4-D arrays then calculates their product
element-wise, then sums them over two axes. The intermediate array of
products could potentially be very large.
It may be possible to avoid this issue using numpy.tensordot, e.g.
mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)
I don't know for sure whether this actually uses less memory. It could
do, as it rearranges the matrices so that the sum-of-products is
calculated using np.dot, which is a built-in function.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list