<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Feb 14, 2015 at 4:38 AM, Glynn Clements <span dir="ltr"><<a href="mailto:glynn@gclements.plus.com" target="_blank">glynn@gclements.plus.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><br>

Paulo van Breugel wrote:<br>

<br>

> >> With memmap you still have a limits of 2Gb I guess, you should try: dask<br>

> >><br>

> ><br>

> Just reading the memmap manual page, where it reads: "Memory-mapped arrays<br>

> use the Python memory-map object which (prior to Python 2.5) does not allow<br>

> files to be larger than a certain size depending on the platform. This size<br>

> is always < 2GB even on 64-bit systems.". Which is unclear to me; I am not<br>

> sure if that means that this limit is different or does not apply when on<br>

> Python 2.5 or newer (what is the minimum python version for GRASS?)<br>

<br>

</span>Even if Python itself no longer imposes a 2 GiB limit, you would<br>

probably need to be using a 64-bit platform, the system would need<br>

enough memory for the operation, and it would need to allow the<br>

process to use that much memory.<br>

<br>

The problem here is that<br>

<span><br>

mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,axis=0)<br>

<br>

</span>expands both delta and VI to 4-D arrays then calculates their product<br>

element-wise, then sums them over two axes. The intermediate array of<br>

products could potentially be very large.<br>

<br>

It may be possible to avoid this issue using numpy.tensordot, e.g.<br>

<br>

mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)<br>

<br>

I don't know for sure whether this actually uses less memory. It could<br>

do, as it rearranges the matrices so that the sum-of-products is<br>

calculated using np.dot, which is a built-in function.<br></blockquote><div><br></div><div>I did some quick testing, it seems to use slightly less memory. The dask solution seems promising, with the disadvantage that it is not widely available / still experimental according to the website. <br><br>For a quick solution, what about using r.tile to split the input data in tiles and compute the mahalanobis distance per tile. Or would that come with too much overhead or other clear disadvantages?<br></div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span><font color="#888888"><br>

--<br>

Glynn Clements <<a href="mailto:glynn@gclements.plus.com" target="_blank">glynn@gclements.plus.com</a>><br>

</font></span></blockquote></div><br></div></div>