[GRASS-dev] How to calculate mean coordinates from big point datasets?

Glynn Clements glynn at gclements.plus.com
Thu Sep 19 19:47:01 PDT 2013


Luca Delucchi wrote:

> maybe v.median [0] could help?

Not for large datasets. First, it requires that the data will fit into
RAM. Second, numpy.median() sorts the entire array and takes the
middle value, which is somewhere between O(n.log(n)) for the typical
case and O(n^2) for the worst case (numpy.median uses the default
sorting algorithm, which is quicksort).

See r.quantile for a more efficient approach for large datasets. The
first pass computes a histogram which allows upper and lower bounds to
be determined for each quantile. The second pass extracts values which
lie within those bounds and sorts them. Except for pathological cases
(where the majority of the data lie within a tiny proportion of the
overall range), only a small fraction of the data are sorted.

In any case: is this question about the mean or the median?
Calculating the mean is far simpler, and can easily be done in O(n)
time and O(1) space.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list