[GRASS-dev] r.series map names buffer overflow

Markus Neteler neteler at osgeo.org
Fri Nov 14 15:19:54 EST 2008


On Fri, Nov 14, 2008 at 5:44 PM, Glynn Clements
<glynn at gclements.plus.com> wrote:
>
> Markus Neteler wrote:
>
>> >> I am unsure how to track this down (maybe there is a fixed buffer in libgis?).
>> >
>> > What does "ulimit -n" say? That's the OS-imposed limit on the number
>> > of open file descriptors per process.
>>
>> Bingo:
>> ulimit -n
>> 1024
>>
>>
>> > On my system, both the hard and soft limits are 1024. The soft limit
>> > can be changed with e.g. "ulimit -n 1500", but only up to the hard
>> > limit. The hard limit can only be changed by root.

ulimit -n 1500
bash: ulimit: open files: cannot modify limit: Operation not permitted

On my box I can only *reduce* the limit as normal user (1023 works).

>> I forgot about this limitation.
>> This is somewhat dangerous, say, could it be trapped? If r.series
>> gets more input files than ulimit -n (C equivalent) allows, could
>> it spit out an error (the manual suggesting than to split into smaller
>> jobs)?
>
> It's possible to detect that this has occurred, but only in the lowest
> levels of libgis, i.e. in G__open(). open() should return EMFILE if it
> fails due to exceeding the per-process resource limit (or ENFILE for
> the system-wide limit, but that's rather unlikely).

Is just counting the number of input maps given to the parser a no-op?
If the user gives more than rlim.rlim_max * 0.95 input files then bail out.

> It isn't feasible to accurately predict that it will occur before the
> fact. Apart from the descriptors for the [f]cell files, which are held
> open throughout the process, other descriptors will already be open on
> entry (at least stdin, stdout and stderr will be open, and often a few
> others inherited from the caller), and additional descriptors will be
> opened temporarily throughout the life of the process (but it's hard
> to know how many, e.g. some libc functions will read configuration or
> data files upon first use).

I see.

> OTOH, it would be straightforward to print a warning if the number of
> maps exceeds e.g. limit * 0.95:
>
>        #ifndef __MINGW32__
>        #include <sys/resource.h>
>
>        struct rlimit lim;
>
>        if (getrlimit(RLIMIT_NOFILE, &rlim) < 0)
>            G_warning("unable to determine resource limit (shouldn't happen)");
>        else if (nmaps > rlim.rlim_max * 0.95)
>            G_warning("may exceed hard limit on number of files; consult your sysadmin in event of errors");
>        else if (nmaps > rlim.rlim_cur * 0.95)
>            /* ulimit is a Bourne-shell command; csh uses `limit' and `unlimit' */
>            G_warning("may exceed soft limit on number of files; use `ulimit -n' in event of errors");
>
>        #endif /* __MINGW32__ */
>
> BTW, now that the G__.fileinfo array is allocated dynamically, I have
> been thinking about making libgis keep open the descriptor for the
> null bitmap. Re-opening the file every few rows can have a significant
> performance impact for modules which are I/O-bound.
>
> However, this would mean that you need twice as many descriptors (or
> will hit the limit with half the number of maps). AFAICT, this was
> (part of) the original reason for not keeping the null bitmap open.

This would definitely be a showstopper for me since I regularly work
with (multi year) time series.

> But that was when Linux had a system-wide limit of (by default) 1024
> open files, set at compile time. Nowadays, typical defaults are 1024
> files per process and ~200k files system-wide, both of which can be
> changed at run time (with ulimit -n and /proc/sys/fs/file-max
> respectively).

A pity that it is apparently yet set to the historic level.
I have
cat /proc/sys/fs/file-max
306995

> Ultimately, if you want to be able to use r.series (or other modules
> which process several maps concurrently) with large numbers of maps,
> you (or your sysadmin) need to ensure that resource limits are set
> accordingly.

Right, thanks for the pointers.

Markus


More information about the grass-dev mailing list