<div dir="ltr"><div><div><div><div><div>Hi Glyn,<br><br></div>Your suggestions to compute the mahalanobis distance work perfect. However, I have a follow up question. <span style="color:rgb(0,0,0)">For large data sets, I am getting a MemoryError when running <span class="im"> "</span></span><span style="color:rgb(0,0,0)"><span class="im">mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,axis=0)"<br><br></span></span></div><span class="im"><span style="color:rgb(0,0,0)"></span></span><span class="im"><span style="color:rgb(0,0,0)">

</span></span><p style="margin:0px;text-indent:0px">    <span style="font-family:monospace,monospace"># pro is list with raster names<br></span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">    s = len(pro)</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">    dat_pro = None</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">    layer = garray.array()</span></p><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">       r, c = layer.shape</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">    for i, map in enumerate(pro):</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">            layer.read(map, null=np.nan)     <br></span>
</p><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">            if dat_pro is None:</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">         dat_pro = np.empty((s, r, c), dtype=np.double)</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">                dat_pro[i,:,:] = layer</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">    del layer</span></p><span style="font-family:monospace,monospace">delta = dat_pro - stat_mean[:,None,None]
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta, axis=0)</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">stat_mahal = garray.array()</span></p><span style="font-family:monospace,monospace">
</span><p style="margin:0px;text-indent:0px"><span style="font-family:monospace,monospace">stat_mahal[...] = mahdist</span></p><span class="im"><br></span></div><span class="im">I guess this is because the calculations are done in-memory? Any way to avoid this memory problem when using large data sets (something like working with memmap objects?)<br><br></span></div><span class="im">Cheers,<br><br></span></div><span class="im">Paulo<br></span><div><div><div><div><span class="im"><br></span></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 2, 2015 at 7:19 PM, Glynn Clements <span dir="ltr"><<a href="mailto:glynn@gclements.plus.com" target="_blank">glynn@gclements.plus.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
Paulo van Breugel wrote:<br>
<br>
> > The second version:<br>
> ><br>
> > > >     VI = np.linalg.inv(covar)<br>
> > > >     delta = dat_ref - stat_mean[:,None,None]<br>
> > > >     m = np.sum(np.sum(delta[:,:,:,None] * delta[:,:,None,:] * VI[None,None,:,:],axis=-1),axis=-1)<br>
> > > >     stat_mah = np.sqrt(m)<br>
> ><br>
> > should work with delta being a 3-D array.<br>
> ><br>
><br>
> Yes, but when running the lines above, I am getting the same error message:<br>
><br>
> Traceback (most recent call last):<br>
>   File "<stdin>", line 1, in <module><br>
> ValueError: operands could not be broadcast together with shapes<br>
> (3,77,78,78) (1,1,3,3)<br>
<br>
</span>Well, it's not quite the same; both operands now have the same number<br>
of dimensions, but they're not compatible (essentially, for each<br>
dimension either both values should be the same or one of them should<br>
be one).<br>
<span class=""><br>
> >From what I can see from broadcasting rules, there is mismatch in the last<br>
> and fore-last dimensions of VI[None,None,:,:] compared to the other two?<br>
><br>
> delta[:,:,:,None].shape<br>
> (3, 77, 78, 1)<br>
><br>
> delta[:,:,None,:].shape<br>
> (3, 77, 1, 78)<br>
><br>
> VI[None,None,:,:].shape<br>
> (1, 1, 3, 3)<br>
<br>
</span>The problem was that delta is 3x77x78 whereas the code assumed that it<br>
would be 77x78x3.<br>
<span class=""><br>
> I am really have to get a better grasp of working with these arrays, but in<br>
> any case, after a bit trial and error, I came up with the following to<br>
> compute m.<br>
><br>
>  m = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,<br>
> axis=0)<br>
><br>
><br>
> This does, in my limited testing, gives the same result as using the loop<br>
> with<br>
><br>
> pow(distance.mahalanobis(cell_ref, stat_mean, VI),2).<br>
><br>
><br>
> to be sure, does the above makes sense to you?<br>
<br>
</span>I believe that's correct.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Glynn Clements <<a href="mailto:glynn@gclements.plus.com">glynn@gclements.plus.com</a>><br>
</font></span></blockquote></div><br></div>