[GRASS-dev] GRASS on cluster: G_malloc: out of memory management
Markus Neteler
neteler at osgeo.org
Sat Nov 8 04:25:29 EST 2008
On Sat, Nov 8, 2008 at 1:18 AM, Glynn Clements <glynn at gclements.plus.com> wrote:
> Markus Neteler wrote:
>> I am running GRASS on an external cluster
BTW: It is this nice cluster:
http://www.hpc2n.umu.se/resources/Akka/
(Akka is ranked 39 on the latest Top 500 list and 16 on Green 500)
>> and have to specify in the
>> job scheduler how much memory I need. Despite a large amount
>> defined, I always get:
>>
>> v.vol.rst ...
>> WARNING: Points are more dense than specified 'DMIN'--ignored 49295 points
>> (remain 9976)
>> Percent complete: ERROR: G_malloc: out of memory
>>
>> Is there a way to make this message more descriptive? Then I could
>> try to figure out if my memory request was simply ignored. Currently
>> I am quite in the dark (since the same job runs on our FEM-CEA
>> cluster with the same about of RAM but different job scheduler).
>>
>> Knowing how much v.vol.rst tried to allocate would be already useful.
>
> --- lib/gis/alloc.c (revision 34189)
> +++ lib/gis/alloc.c (working copy)
> @@ -42,7 +42,7 @@
> if (buf)
> return buf;
>
> - G_fatal_error(_("G_malloc: out of memory"));
> + G_fatal_error(_("G_malloc: unable to allocate %lu bytes"), (unsigned long) n);
> return NULL;
> }
This helped already helped: I had to request a little bit more RAM
for the jobs. Now the queue is filled up.
> A slightly more informative option would be to report the immediate
> caller e.g.:
>
> void *G__malloc(const char *file, int line, size_t n)
> {
> void *buf;
>
> if (n <= 0)
> n = 1; /* make sure we get a valid request */
>
> buf = malloc(n);
> if (buf)
> return buf;
>
> G_fatal_error(_("G_malloc: unable to allocate %lu bytes at %s:%d"),
> (unsigned long) n, file, line);
> return NULL;
> }
>
> Then in gisdefs.h:
>
> -void *G_malloc(size_t);
> +void *G__malloc(const char *, int, size_t);
> +#define G_malloc(n) G__malloc(__FILE__, __LINE__, (n))
I think that it would help a lot (from time to time users come up with
out of memory problems and usually they set the raster resolution to
nanometers).
If you submit it to GRASS 7, I'll backport as usual.
> Alternatively, have G_malloc() call abort() on error; this will
> normally generate a coredump (if coredumps are enabled) which can be
> examined with gdb to determine the complete call chain and the exact
> state of the process.
>
> However, bear in mind that abort()ing on out-of-memory is likely to
> produce large coredumps; ensure that "ulimit -c" is set accordingly.
This sounds rather risky (and if I submit 1400 jobs and 40% coredump,
they may remove my account...).
The second solution sounds perfect.
Markus
More information about the grass-dev
mailing list