[GRASS-dev] Compute mahalanobis distance using Scipy

Paulo van Breugel p.vanbreugel at gmail.com
Sat Feb 14 08:47:30 PST 2015


On Sat, Feb 14, 2015 at 4:38 AM, Glynn Clements <glynn at gclements.plus.com>
wrote:

>
> Paulo van Breugel wrote:
>
> > >> With memmap you still have a limits of 2Gb I guess, you should try:
> dask
> > >>
> > >
> > Just reading the memmap manual page, where it reads: "Memory-mapped
> arrays
> > use the Python memory-map object which (prior to Python 2.5) does not
> allow
> > files to be larger than a certain size depending on the platform. This
> size
> > is always < 2GB even on 64-bit systems.". Which is unclear to me; I am
> not
> > sure if that means that this limit is different or does not apply when on
> > Python 2.5 or newer (what is the minimum python version for GRASS?)
>
> Even if Python itself no longer imposes a 2 GiB limit, you would
> probably need to be using a 64-bit platform, the system would need
> enough memory for the operation, and it would need to allow the
> process to use that much memory.
>
> The problem here is that
>
> mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) *
> delta,axis=0)
>
> expands both delta and VI to 4-D arrays then calculates their product
> element-wise, then sums them over two axes. The intermediate array of
> products could potentially be very large.
>
> It may be possible to avoid this issue using numpy.tensordot, e.g.
>
> mahdist = np.sum(np.tensordot(VI, delta, (1,0)) * delta,axis=0)
>
> I don't know for sure whether this actually uses less memory. It could
> do, as it rearranges the matrices so that the sum-of-products is
> calculated using np.dot, which is a built-in function.
>

I did some quick testing, it seems to use slightly less memory. The dask
solution seems promising, with the disadvantage that it is not widely
available / still experimental according to the website.

For a quick solution, what about using r.tile to split the input data in
tiles and compute the mahalanobis distance per tile. Or would that come
with too much overhead or other clear disadvantages?


> --
> Glynn Clements <glynn at gclements.plus.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20150214/11e5b10c/attachment.html>


More information about the grass-dev mailing list