[GRASS-user] v.clean process killed itselt!?

Markus Neteler neteler at osgeo.org
Sat Jan 10 03:49:30 EST 2009


On Sat, Jan 10, 2009 at 9:07 AM, Glynn Clements
<glynn at gclements.plus.com> wrote:
> Markus Neteler wrote:
> Possibly. But with a map this large, you don't need a leak. The raw
> data will barely fit into memory, and any per-vertex, per-edge etc
> data could easily push it over the limit.
>
> AFAICT from the output and the code, it's dying in Vect_snap_lines().
>
> Looking into it more, I don't think that it's a leak; I just think
> that it's trying to store an "expanded" (i.e. bloated) version of a
> 2.7GiB map in RAM on a system which only has 4GiB.
>
> E.g. for each line vertex, it stores a bounding rectangle (actually a
> cube, 6 doubles, 48 bytes). If there are 122 million vertices and only
> ~2 million are centroids, that could be 120 million line segments,
> which would be ~5.4GiB.

Related:
Remove bounding box from support structures (?)
http://trac.osgeo.org/grass/browser/grass/trunk/doc/vector/TODO#L89

> Then there's the vertices themselves, and it's storing a significant
> fraction of those at 2*8+4 = 20 bytes each, which could consume
> anything up to 2.4GiB (the extra 4 bytes per vertex accounts for the
> difference to the size of the "coor" file).

Would it be possible to develop a (rough) formula to estimate
the memory need? With Thomas Huld we did so for the new
r.sun and it's quite useful to pick the right computer before
launching a multiple day job (in case you have a choice of
course).

> Add onto that the additional data used for e.g. the current boundary
> (which could be most of the map if it's a long, detailed stretch of
> intricate coastline), new vertices created during snapping, other
> housekeeping data etc and it could easily exceed RAM.

Is this along the lines of the suggestion to break long lines?

http://trac.osgeo.org/grass/browser/grass/trunk/doc/vector/TODO#L242
242 	v.in.ogr
243 	--------
244 	It would be useful to split long boundaries to smaller
245 	pieces. Otherwise cleaning process can become very slow because
246 	bounding box of long boundaries can overlap large part of the map (for
247 	example outline around all areas) and cleaning process is checking
248 	intersection with all boundaries falling in the bounding box.

I wonder how hard that is to implement (since we have the
v.split algorithm).

Markus


More information about the grass-user mailing list