[GRASS-dev] Avoiding maximum open files limit in r.series

Glynn Clements glynn at gclements.plus.com
Sat Oct 8 08:34:10 EDT 2011


Sören Gebbert wrote:

> > Make opening/reading/closing maps for each row a separate feature
> > (-x flag). This has a significant performance impact, may be
> > unnecessary ("ulimit -n" is 1024 by default, but this can be changed
> > if you have sufficient privilege; 100k open files is quite possible),
> > and may be necessary even if map names are specified on the command
> > line (via input=).
> 
> All of my colleagues and our system administrators do not have the
> knowledge to increase the open file limit on Unix machines. And i
> don't know if it is possible to set this limit on Windows or Mac OS,
> so i thought it would be a meaningful addition. I have hit the Python
> subprocess limit of command line arguments when running r.series using
> grass.script.run_command() and did not found a solution ... accept to
> patch r.series ... .

The point is that the limit on the length of a command line and the
limit on the number of open files are quite separate.

The open file limit might be exceeded when map names are given on the
command line, and it might not be exceeded when map names are read
from a file (reading map names from a file needn't be restricted to
the case where there are too many names to fit on the command line; it
may just be more convenient to read the names from a file).

So rather than having the open/close behaviour tied to input= versus
file=, I added a separate flag for it. If you run out of file
descriptors ("Too many open files", i.e. errno==EMFILE), use -x;
otherwise, not using -x is likely to be faster.

> >> Memory footprint is about 330MB of RAM. But this looks like a memory
> >> leak  to me, because the memory consumption raise linear with the
> >> processed rows of the output map. All the memory allocation in
> >> r.series is done before the row processing ... ???
> >
> > I suspect that this will be due to re-opening the maps for each row.
> > Normally, an overhead on each call to Rast_open_old() would be
> > considered a per-map overhead, and we wouldn't worry about a few kB
> > per map.

> So we have two options to solve the memory leak?
> 1.) Correct memory management while closing maps
> 2.) Modification of the raster map identification
> 
> Is it worth the effort to correct the memory management while closing
> maps or should we try to change the the raster map identification?

I'm not sure what you mean by "identification".

> Maybe we can provide additional functions which only initialize the
> fileinfo structure but does not keep file descriptors open?

That's what I was suggesting; e.g. Rast_suspend() and Rast_resume().

> And the
> call of Rast_open_old will open only file descriptors in case the
> fileinfo is already set up?

I think that Rast_open_old() should always create a new fileinfo
structure. There may be modules which will get confused if two
Rast_open_old() calls return the same file descriptor.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list