[GRASS-dev] Compute mahalanobis distance using Scipy
Paulo van Breugel
p.vanbreugel at gmail.com
Fri Feb 13 00:57:55 PST 2015
Hi Glyn,
Your suggestions to compute the mahalanobis distance work perfect. However,
I have a follow up question. For large data sets, I am getting a
MemoryError when running "mahdist = np.sum(np.sum(delta[None,:,:,:] *
VI[:,:,None,None],axis=1) * delta,axis=0)"
# pro is list with raster names
s = len(pro)
dat_pro = None
layer = garray.array()
r, c = layer.shape
for i, map in enumerate(pro):
layer.read(map, null=np.nan)
if dat_pro is None:
dat_pro = np.empty((s, r, c), dtype=np.double)
dat_pro[i,:,:] = layer
del layer
delta = dat_pro - stat_mean[:,None,None]
mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) *
delta, axis=0)
stat_mahal = garray.array()
stat_mahal[...] = mahdist
I guess this is because the calculations are done in-memory? Any way to
avoid this memory problem when using large data sets (something like
working with memmap objects?)
Cheers,
Paulo
On Mon, Feb 2, 2015 at 7:19 PM, Glynn Clements <glynn at gclements.plus.com>
wrote:
>
> Paulo van Breugel wrote:
>
> > > The second version:
> > >
> > > > > VI = np.linalg.inv(covar)
> > > > > delta = dat_ref - stat_mean[:,None,None]
> > > > > m = np.sum(np.sum(delta[:,:,:,None] * delta[:,:,None,:] *
> VI[None,None,:,:],axis=-1),axis=-1)
> > > > > stat_mah = np.sqrt(m)
> > >
> > > should work with delta being a 3-D array.
> > >
> >
> > Yes, but when running the lines above, I am getting the same error
> message:
> >
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > ValueError: operands could not be broadcast together with shapes
> > (3,77,78,78) (1,1,3,3)
>
> Well, it's not quite the same; both operands now have the same number
> of dimensions, but they're not compatible (essentially, for each
> dimension either both values should be the same or one of them should
> be one).
>
> > >From what I can see from broadcasting rules, there is mismatch in the
> last
> > and fore-last dimensions of VI[None,None,:,:] compared to the other two?
> >
> > delta[:,:,:,None].shape
> > (3, 77, 78, 1)
> >
> > delta[:,:,None,:].shape
> > (3, 77, 1, 78)
> >
> > VI[None,None,:,:].shape
> > (1, 1, 3, 3)
>
> The problem was that delta is 3x77x78 whereas the code assumed that it
> would be 77x78x3.
>
> > I am really have to get a better grasp of working with these arrays, but
> in
> > any case, after a bit trial and error, I came up with the following to
> > compute m.
> >
> > m = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,
> > axis=0)
> >
> >
> > This does, in my limited testing, gives the same result as using the loop
> > with
> >
> > pow(distance.mahalanobis(cell_ref, stat_mean, VI),2).
> >
> >
> > to be sure, does the above makes sense to you?
>
> I believe that's correct.
>
> --
> Glynn Clements <glynn at gclements.plus.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20150213/e464fcf5/attachment.html>
More information about the grass-dev
mailing list