[GRASS-dev] [GRASS GIS] #3198: r.stats.quantile: hardcoded max number of categries in base map
GRASS GIS
trac at osgeo.org
Mon Nov 7 05:23:12 PST 2016
#3198: r.stats.quantile: hardcoded max number of categries in base map
--------------------------+---------------------------------------
Reporter: mlennert | Owner: grass-dev@…
Type: defect | Status: new
Priority: normal | Milestone: 7.2.1
Component: Raster | Version: unspecified
Resolution: | Keywords: r.stats.quantile MAX_CATS
CPU: Unspecified | Platform: Unspecified
--------------------------+---------------------------------------
Comment (by mlennert):
Replying to [comment:1 glynn]:
> Replying to [ticket:3198 mlennert]:
>
> > Is there any specific reason for this ? I would like to use
r.stats.quantile in i.segment.stats to calculate percentiles per segment,
but number of segments can be much higher than 1000.
>
> The limit was added so that if someone tries to use a base map with a
million categories, it just fails quickly, rather than attempting
something which will either exhaust memory or take days to run.
>
> For each category in the base map, it allocates a basecat structure,
each of which references several dynamically-allocated arrays. The .slots
and .slot_bins arrays are sized based upon the bins= option, the .values
array is sized to hold all of the values falling into any bin containing
to a quantile, the .quants and .bins arrays according to the number of
quantiles.
>
> As well as the memory consumption, almost all processing is per-
category.
>
> Having said that, more categories will tend to result in less data per
category. However, there are some non-trivial per-category overheads. On
the other hand, sorting the bins containing quantiles should be faster
overall with more bins but proportionally less data in each bin.
>
> There's no fundamental reason why the limit can't be raised; or even
abolished, if you don't mind an unsuitable choice of base map resulting in
"unable to allocate" errors, or just taking forever.
A warning was maintained. At least the user is made aware and can stop the
module.
> Consider putting a limit on num_cats*num_slots; a map with many
categories should presumably require fewer bins (assuming that the data
isn't concentrated into a handful of categories).
In r69776 MarkusM introduce dynamic bins, although I don't really
understand what this means ;-).
More generally: the man page of r.stats.quantile does lack a bit of info
about its parameters, notably the 'bin' parameter. A short paragraph
explaining how the module works would be useful.
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3198#comment:2>
GRASS GIS <https://grass.osgeo.org>
More information about the grass-dev
mailing list