[GRASS-dev] Compute mahalanobis distance using Scipy

Paulo van Breugel p.vanbreugel at gmail.com
Mon Feb 2 08:00:13 PST 2015


On Mon, Feb 2, 2015 at 3:40 PM, Paulo van Breugel <p.vanbreugel at gmail.com>
wrote:

>
>
> On Mon, Feb 2, 2015 at 1:38 PM, Glynn Clements <glynn at gclements.plus.com>
> wrote:
>
>>
>> Paulo van Breugel wrote:
>>
>> > > The main speed-up will come from "inlining" distance.mahalanobis(),
>> > > which is essentially:
>> > >
>> > >     delta = u - v
>> > >     m = np.dot(np.dot(delta, VI), delta)
>> > >     return np.sqrt(m)
>> > >
>> > > np.dot(v, m) is equivalent to np.sum(v * m,axis=0), and np.dot(u,v) is
>> > > equivalent to np.sum(u * v), so the second line above is equivalent to
>> > >
>> > >     m = np.sum(np.sum(delta[:,None] * VI * delta[None,:], axis=-1),
>> > > axis=-1)
>> >
>> >
>> > Smart! I can follow the logic, but I am not sure how to solve the
>> problem
>> > below:
>> >
>> >    Traceback (most recent call last):
>> >     File "<stdin>", line 1, in <module>
>> >     ValueError: operands could not be broadcast together with shapes
>> > (3,1,77,78) (3,3)
>> >
>> > Which refers to the different dimensions of delta and VI?
>>
>> The first version of the code (which is quoted above) will only work
>> with 1-D arrays. It's just a fairly direct translation of the existing
>> distance.mahalanobis() function, given to explain how it can then be
>> extened to 3-D arrays (i.e. a 2-D array of 1-D vectors).
>>
>> The second version:
>>
>> > >     VI = np.linalg.inv(covar)
>> > >     delta = dat_ref - stat_mean[:,None,None]
>> > >     m = np.sum(np.sum(delta[:,:,:,None] * delta[:,:,None,:] *
>> VI[None,None,:,:],axis=-1),axis=-1)
>> > >     stat_mah = np.sqrt(m)
>>
>> should work with delta being a 3-D array.
>>
>
>
> Yes, but when running the lines above, I am getting the same error message:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: operands could not be broadcast together with shapes
> (3,77,78,78) (1,1,3,3)
>
> From what I can see from broadcasting rules, there is mismatch in the last
> and fore-last dimensions of VI[None,None,:,:] compared to the other two?
>
> delta[:,:,:,None].shape
> (3, 77, 78, 1)
>
> delta[:,:,None,:].shape
> (3, 77, 1, 78)
>
> VI[None,None,:,:].shape
> (1, 1, 3, 3)
>
> I am really have to get a better grasp of working with these arrays, but
> in any case, after a bit trial and error, I came up with the following to
> compute m.
>
>  m = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,
> axis=0)
>
>
> This does, in my limited testing, gives the same result as using the loop
> with
>
> pow(distance.mahalanobis(cell_ref, stat_mean, VI),2).
>
>
> to be sure, does the above makes sense to you?
>


About the speed up.. that is indeed rather significant. With three layers
as input (cells: 1122582) my original function took 30 seconds, the one
above 0.23 seconds. Not bad :-)



>
> Paulo
>
>
>
>
>
>
>
>
>
>
>> --
>> Glynn Clements <glynn at gclements.plus.com>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20150202/6bd61fc5/attachment-0001.html>


More information about the grass-dev mailing list