[GRASS-dev] Re: [GRASS-user] quantile/percentile

Glynn Clements glynn at gclements.plus.com
Thu Sep 17 08:52:43 EDT 2009


Laura Poggio wrote:

> I would need to calculate the resulting percentile map from a large number
> of maps (>10000), obtaining the values for 5 and 95 percentile for each
> pixel. Is there any function in GRASS?

Recent versions of r.series can calculate arbitrary quantiles.

You may have to make some configuration changes in order to allow a
single process to have >10000 open file descriptors (for Linux, check
/etc/security/limits.conf).

Memory shouldn't be an issue, as r.series only needs to hold one row
from each map at a time.

This won't be particularly fast, as r.series method=quantile sorts all
of the values then picks out the requested quantile. And if you
calculate multiple quantiles, it will sort the values once for each
quantile.

CC to grass-dev:

The duplicate sorting can be gotten around in a several ways:

1. Have the sorting routine first check for an already sorted list.

2. Add a global variable to lib/stats to indicate that the data is
already sorted.

3. Add yet another parameter to the stats functions to indicate that
the data is sorted.

4. Use the closure parameter to indicate that the data is sorted (for
c_quant(), this would mean that it would need to point to a structure
containing both the quantile and the sorted flag).

#1 is inefficient, #2 is a bit of a hack, #3 adds another parameter
which most functions won't use (only diversity, median, mode,
quantiles need sorted data), #4 is awkward to use.

The inefficient quantile algorithm can be replaced (or augmented) with
the more efficient (and more complex) one used by r.quantile and
r.statistics3. But is it worth the effort and complexity?

Essentially, it only matters if the number of maps is large enough
that the qsort() dominates. For a few maps, the setup overhead of the
more complex algorithm will make it slower. For an intermediate
number, the speed gain may be small compared to other overheads. It's
worth it for calculating quantiles over millions of values (although
the primary motivation was to avoid having to store all values in
memory), but I don't know if it's worth it for 10,000 values.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list