[GRASS-dev] Re: IMPORTANT: [GRASS GIS] #837: Memory leaks in r.example

Glynn Clements glynn at gclements.plus.com
Mon Dec 21 17:17:26 EST 2009


Hamish wrote:

> > Essentially, you don't know whether it's safe to free the mapset
> > string returned from G_find_*. If it isn't and you free it, it can
> > cause a crash. If it is and you don't free it, you've leaked some
> > memory.
> > 
> > Most of the time this isn't a problem, but the null bitmap code calls
> > these functions every NULL_ROWS_INMEM (== 8) rows. For large maps (or
> > for large numbers of maps, e.g. r.series), this could add up (even if
> > mapset names are short, there's an overhead for each malloc()'d
> > block).
> 
> for some perspective, what's the maximum damage? what's the typical damage?
> 
> as a "worst plausible case" say you have 50,000 x 50,000 cell raster
> output from r.sun, and you have 365 of them you want to run them through
> r.series.
> 
> as a "typical heavy load case" let's say a 4000x4000 cell raster with
> a dozen of them to run through r.series.
> 
> if the leak is a *mapset string, from your above comments above and the
> G_find_file() code that means GMAPSET_MAX (256) bytes+whatever system
> overheads that incurs.

That's a worst case; I doubt that anyone uses 256-byte mapset names in
practice.

> for these two cases that would be:
> 
> expected worst case:
>   50000/8 * 256 * 365 = 570mb
> 
> normal heavy load: 
>   4000/8 * 256 * 12 = 1.5mb
> 
> 
> how's my math?

Seems okay.

> compare the leak to r.series memory req for that sized map:
>  does it keep 8 rows of all maps in memory during processing(??)
>  (not sure, but going with that...)

It's 8 map rows of null data, one bit per region column (i.e. 
resampled horizontally but not vertically).

> 8 rows * 50k columns * sizeof(DCELL) * 365 maps = 570mb

8 rows * 50k columns * 1/8 * 365 maps = 18.25Mb.

However, the library also stores the current row of decompressed,
unconverted, unresampled raster data (between 1 and 8 bytes per map
column):

	50k columns * 1 * 365 maps = 18.25Mb (1 byte/cell)
	50k columns * 8 * 365 maps = 146Mb   (8 bytes/cell)

while r.series itself stores one row of resampled DCELL data per
region column (another 145Mb).

So the worst case would be 570Mb data from the leak versus 182Mb of
actual data usage, which would be a problem. In practice, a
256-character mapset name is a massive over-estimate; if you allow 32
bytes including malloc() overhead, you're down to 71Mb, which is less
of a problem.

The actual numbers change depending upon resampling (horizontal
resampling will affect the memory used by r.series itself, and the
size of the null bitmap; vertical resampling shouldn't affect
anything), the number of map/region rows (more rows leak more but
don't affec any of the buffer sizes) and the size of each G_find_*
allocation. For 50k rows x 1k columns, the numbers would come out much
worse (legitimate memory consumption reduced 50x, leak unchanged).

> so total mem use at completion = 1.2gb
> 
> 
> if all of the above is correct (& I'm not sure it is), then I would say
> that anyone running the above worse case is probably running it on a
> system which has >2gb ram, & so not a cause for panic.

Absolute memory figures aren't necessarily meaningful on a server or
multi-user system. It's more useful to consider the ratio of the
actual memory usage to the necessary memory usage. If each process
uses twice as much memory as it should, you can only have half as
many.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list