[GRASS-dev] Avoiding maximum open files limit in r.series
glynn at gclements.plus.com
Wed Oct 5 18:49:08 EDT 2011
Sören Gebbert wrote:
> Dear devs,
> just for your information, i have added the support of input files
> with newline separated map names to r.series.
> r.series now supports two input methods, file and input. Using option
> <file> is slower but avoids the open file descriptor limit.
I've made some changes (mostly just clean-up) to this; can you test
Make opening/reading/closing maps for each row a separate feature
(-x flag). This has a significant performance impact, may be
unnecessary ("ulimit -n" is 1024 by default, but this can be changed
if you have sufficient privilege; 100k open files is quite possible),
and may be necessary even if map names are specified on the command
line (via input=).
Only read the file once, reallocating the array dynamically.
Can't use G_check_input_output_name, as parm.output->multiple=YES.
Don't use C99-specific features (specifically, variable declarations
intermingled with statements).
Move variables from function scope to block scope where possible.
> I have tested r.series with ~6000 maps (ECA&D daily temperature data
> from 1995-2010) each ~100000 cells. Computation needs for method
> about 3 minutes on my (fast) machine.
> Memory footprint is about 330MB of RAM. But this looks like a memory
> leak to me, because the memory consumption raise linear with the
> processed rows of the output map. All the memory allocation in
> r.series is done before the row processing ... ???
I suspect that this will be due to re-opening the maps for each row.
Normally, an overhead on each call to Rast_open_old() would be
considered a per-map overhead, and we wouldn't worry about a few kB
Opening a map is quite an expensive operation, as it has to find which
mapset contains the map, determine its type (CELL/FCELL/DCELL), read
its cellhd (and possibly other files, e.g. reclass table), set up the
column mapping, etc.
For this particular case (and anything else like it), the process
could be accelerated significantly by keeping the fileinfo structure
around and just closing and re-opening (and re-positioning) the
descriptors (one for the raster data, one for the null bitmap).
One significant problem with doing this, however, is that raster maps
are identified by the file descriptor for their data: the "fd"
parameter to Rast_get_row() etc, and the index into the R__.fileinfo
array, is the actual file descriptor.
It wouldn't be a great deal of work to change this, so that the "fd"
parameter was just the index into the R__.fileinfo array, and the
fileinfo structure contained the actual fd. However, we would need to
make sure that we catch every case where "fd" needs to be changed to
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev