[GRASS-dev] [GRASS GIS] #438: v.distance -a uses too much memory
GRASS GIS
trac at osgeo.org
Fri Jan 16 09:20:52 EST 2009
#438: v.distance -a uses too much memory
------------------------------------------+---------------------------------
Reporter: mlennert | Owner: grass-dev at lists.osgeo.org
Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: v.distance memory allocation | Platform: Unspecified
Cpu: Unspecified |
------------------------------------------+---------------------------------
Not sure if this should be considered as a bug or a wish for
enhancement...chosing bug for now as it makes the module useless with
large files.
When trying to calculate a distance matrix between 20 000 points with
v.distance -a, I get:
ERREUR:G_realloc: unable to allocate 1985321728 bytes at main.c:568
As the machine only has 1 GB of RAM, this is normal, but v.distance should
be rewritten to not keep everything in memory, at least when dealing with
the -a flag, and to only allocate memory for data really requested.
Currently, it allocates memory for a large number of NEAR structures
(3xint+10xdouble i.e., for example, 3x4+10x8=92Bytes for each point) which
contain space for all the potential uplad options (lines 447-8 of
vector/v.distance/main.c):
{{{
anear = 2 * nfrom;
Near = (NEAR *) G_calloc(anear, sizeof(NEAR));
}}}
And then goes on to if necessary add memory space for the entire From x To
matrix (lines 566-8 of vector/v.distance/main.c) in the loop of the to
objects (count= total number of distances calculated after each loop):
{{{
if (anear <= count) {
anear += 10 + nfrom / 10;
Near = (NEAR *) G_realloc(Near, anear * sizeof(NEAR));
}}}
I'm not sure I completely understand this last part, as it seems to create
huge jumps in allocation, i.e. when the count of distances goes beyond
nfrom*2 (or later values of anear), it reallocates memory space for anear
new NEARS. In my case, when count>40000, anear=40000+10+20000/10=42010,
i.e. adding space for 2010 new NEAR structures, without knowing (AFAICT)
how many will actually still come...
But, as I said, I don't understand the code well enough to make a definite
judgement. It would seem, however, that it might be better to calculate
each distance and update the table immediately, or maybe write the
necessary queries to a temp file to be able to launch the query at the end
in one run, but without keeping everything in memory.
--
Ticket URL: <http://trac.osgeo.org/grass/ticket/438>
GRASS GIS <http://grass.osgeo.org>
More information about the grass-dev
mailing list