[GRASS-dev] How to calculate mean coordinates from big point datasets?

Markus Metz markus.metz.giswork at gmail.com
Fri Sep 20 13:30:20 PDT 2013


On Fri, Sep 20, 2013 at 5:38 PM, Markus Metz
<markus.metz.giswork at gmail.com> wrote:
> Glynn Clements wrote:
>>
>> Luca Delucchi wrote:
>>
>>> maybe v.median [0] could help?
>>
>> Not for large datasets. First, it requires that the data will fit into
>> RAM. Second, numpy.median() sorts the entire array and takes the
>> middle value, which is somewhere between O(n.log(n)) for the typical
>> case and O(n^2) for the worst case (numpy.median uses the default
>> sorting algorithm, which is quicksort).
>>
>> See r.quantile for a more efficient approach for large datasets. The
>> first pass computes a histogram which allows upper and lower bounds to
>> be determined for each quantile. The second pass extracts values which
>> lie within those bounds and sorts them.
>
> The approach of v.median and r.quantile regards each dimension
> separately which is IMHO not correct in a spatial context.

This

> The median
> of a point cloud would be a multivariate median for 2 or 3 dimensions.
> You would need to sort the points first by the first dimension, then
> by the second dimension etc, then pick the coordinates of the mid
> point.

is wrong, please ignore.

This

> Alternatively you would need to find the point with the
> smallest distance to all other points, which is nightmare to calculate
> ( (n - 1) x (n - 1) distance calculations).

is correct, but instead of finding the existing point with the
smallest distance, that point can be approximated with less effort.

Markus M


More information about the grass-dev mailing list