[GRASS-user] quantile/percentile

Laura Poggio laura.poggio at gmail.com
Fri Sep 18 09:00:33 EDT 2009


Thank you!
 r.series is working fine. I am now trying to solve the problem of the limit
for the number of files within my linux distribution.
It is rather fast, much faster then using R. However my rasters are rather
small (15000 cells), then I do not know how it would behave with a much
larger file.

Thank you again

Laura

2009/9/17 Glynn Clements <glynn at gclements.plus.com>

>
> Laura Poggio wrote:
>
> > I would need to calculate the resulting percentile map from a large
> number
> > of maps (>10000), obtaining the values for 5 and 95 percentile for each
> > pixel. Is there any function in GRASS?
>
> Recent versions of r.series can calculate arbitrary quantiles.
>
> You may have to make some configuration changes in order to allow a
> single process to have >10000 open file descriptors (for Linux, check
> /etc/security/limits.conf).
>
> Memory shouldn't be an issue, as r.series only needs to hold one row
> from each map at a time.
>
> This won't be particularly fast, as r.series method=quantile sorts all
> of the values then picks out the requested quantile. And if you
> calculate multiple quantiles, it will sort the values once for each
> quantile.
>
> CC to grass-dev:
>
> The duplicate sorting can be gotten around in a several ways:
>
> 1. Have the sorting routine first check for an already sorted list.
>
> 2. Add a global variable to lib/stats to indicate that the data is
> already sorted.
>
> 3. Add yet another parameter to the stats functions to indicate that
> the data is sorted.
>
> 4. Use the closure parameter to indicate that the data is sorted (for
> c_quant(), this would mean that it would need to point to a structure
> containing both the quantile and the sorted flag).
>
> #1 is inefficient, #2 is a bit of a hack, #3 adds another parameter
> which most functions won't use (only diversity, median, mode,
> quantiles need sorted data), #4 is awkward to use.
>
> The inefficient quantile algorithm can be replaced (or augmented) with
> the more efficient (and more complex) one used by r.quantile and
> r.statistics3. But is it worth the effort and complexity?
>
> Essentially, it only matters if the number of maps is large enough
> that the qsort() dominates. For a few maps, the setup overhead of the
> more complex algorithm will make it slower. For an intermediate
> number, the speed gain may be small compared to other overheads. It's
> worth it for calculating quantiles over millions of values (although
> the primary motivation was to avoid having to store all values in
> memory), but I don't know if it's worth it for 10,000 values.
>
> --
> Glynn Clements <glynn at gclements.plus.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/grass-user/attachments/20090918/66fab100/attachment.html


More information about the grass-user mailing list