[GRASS-dev] [GRASS GIS] #3198: r.stats.quantile: hardcoded max number of categries in base map

GRASS GIS trac at osgeo.org
Mon Nov 7 02:49:03 PST 2016


#3198: r.stats.quantile: hardcoded max number of categries in base map
--------------------------+---------------------------------------
  Reporter:  mlennert     |      Owner:  grass-dev@…
      Type:  defect       |     Status:  new
  Priority:  normal       |  Milestone:  7.2.1
 Component:  Raster       |    Version:  unspecified
Resolution:               |   Keywords:  r.stats.quantile MAX_CATS
       CPU:  Unspecified  |   Platform:  Unspecified
--------------------------+---------------------------------------

Comment (by glynn):

 Replying to [ticket:3198 mlennert]:

 > Is there any specific reason for this ? I would like to use
 r.stats.quantile in i.segment.stats to calculate percentiles per segment,
 but number of segments can be much higher than 1000.

 The limit was added so that if someone tries to use a base map with a
 million categories, it just fails quickly, rather than attempting
 something which will either exhaust memory or take days to run.

 For each category in the base map, it allocates a basecat structure, each
 of which references several dynamically-allocated arrays. The .slots and
 .slot_bins arrays are sized based upon the bins= option, the .values array
 is sized to hold all of the values falling into any bin containing to a
 quantile, the .quants and .bins arrays according to the number of
 quantiles.

 As well as the memory consumption, almost all processing is per-category.

 Having said that, more categories will tend to result in less data per
 category. However, there are some non-trivial per-category overheads. On
 the other hand, sorting the bins containing quantiles should be faster
 overall with more bins but proportionally less data in each bin.

 There's no fundamental reason why the limit can't be raised; or even
 abolished, if you don't mind an unsuitable choice of base map resulting in
 "unable to allocate" errors, or just taking forever. Consider putting a
 limit on num_cats*num_slots; a map with many categories should presumably
 require fewer bins (assuming that the data isn't concentrated into a
 handful of categories).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3198#comment:1>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list