[GRASS5] Re: [bug #3877] (grass) r.to.vect: severe memory leaks, I'm helpless

Mon Dec 5 15:52:16 EST 2005

On Mon, 5 Dec 2005, Maciek Sieczka wrote:

> On pon, 2005-12-05 at 19:11 +0100, Roger Bivand wrote:
> > On Mon, 5 Dec 2005, Maciek Sieczka wrote:
> > 
> > > On pon, 2005-12-05 at 13:38 +0100, Radim Blazek via RT wrote:
> > > > please read old mails on this problem. I dont have time to explain it
> > > > again and again. AFAIK there are no big memory leaks.
> > > 
> > > Is it aknowledged by Grass developers that a machine freeze at 5 mln
> > > vector points file is a BUG (no matter what the reason is)?
> > > 
> > > If it is aknowledged, can we expect it to be fixed? When - soon, month
> > > time, year time? Or is it going to be a "feature" and left as is?
> > 
> > I think this is unfair.
> 
> No. This is a question. I've got a task to accomplish which I can't in
> Grass6 currently. I'm asking what are the chances my problem will be
> fixed soon/ever. I have about 3 months or so for my task. I need to know
> where I'm standing instead of "I don't have time to explain". And how do
> I even know if those "old mails" reflect the current state?
> 
> It's Radim's answer that is unfair, not my reply. Or maybe I'm doing
> something rude reporting bugs and I get what I deserve?

No, you have (your perception of) a real problem, and are trying to find a 
feasible solution. Radim's work on the vector data model has helped with 
many problems, but not this one. Getting at a problem like this is, as you 
know well, very layered. You have an input DEM, which you need to warp 
with high precision in the z-dimension to a different spatial reference 
system. 

Assuming that the input DEM is an exact representation of the
sub-vegetation surface as it was when it was surveyed, warping will
introduce some errors and interpolating will introduce others. You have
chosen to transform the raster points (abot 50M) to the target spatial
reference system and interpolate, I guess because you tried warping and
found the error unacceptable. You have three months (but there is snow in
the ground now), so field surveying to establish a baseline for error in
one or several trial plots is still feasible, maybe?

But I'm not aware of surveyed or interpolated DEMs that are without 
measurement error themselves - David Unwin has a nice statement in one of 
his books about the sobering effect of comparing field survey elevations 
from leveling and DEM values. So what we are looking for is a way of 
getting from the input DEM to the output DEM without introducing 
systematic error (like the artefacts at patch/tile boundaries) and without 
adding too much to the error already present.

Given that GRASS 6 vector points for even 10% of the data set are a 
problem, is it possible to establish the relative performance of warping 
versus interpolation on - say - a number of 1% sample plots? Does r.proj 
give similar outcomes to gdalwarp? I guess you've looked at all of this, 
and I apologize for thinking aloud. I just feel that getting to the main 
question of making sure output map errors are not systematic is quite 
difficult, and not obvious. 

Since the area of interest has fairly large changes of elevation over 
short distances in some parts, it might even be possible to thin out 
uninformative points (say those within some threshold of their 
neighbours) where the relief is not detailed, keeping points that are 
"needed" for interpolation. 

> 
> > There has been progress in GRASS 6 on this, and 
> > the vector architecture is much stronger than it was in GRASS 5
> 
> Am I saying it isn't progressing? I'm saying it still not good enough.
> But I'm happy with any single improvement that takes place. Sorry if I
> don't express it enough. Sure it is easier to point out errors than good
> things. But anyway, this are a devel list and bugtracker - a place for
> discussing problems mainly.
> 
> > for moderate and large data sets, but not for XXL.
> 
> What's XXL? I need to reproject a detailed 5m DEM of one national park
> only. To do it properly, it has to be transformed into vector points,
> these will be reprojected, then a DEM in new projection will be
> reinterpolated. Unfortunatelly reprojecting a DEM as raster yields
> distrotions AFAIK.
> 
> > Have you considered using GRASS 5, which has sites, a very much simpler 
> > data model for points?
> 
> I had had bad experience with vector point in Grass 6 before, so
> actually it was the first thing to try sites in 5.4. Although I managed
> to transform my 50 mln cells DEM into sites and reproject those,
> s.surf.rst crashed on such dataset. Since it was an 8GB P4 with plenty
> of swap, I didn't even try it at home on my 1GB RAM machine - I wonder
> if I could get at least that far.
> 
> And I didn't report the bug in s.surf.rst because Grass 5 is no longer
> mantained by the core dev crew.

Where crash means segmentation fault or complete occupation of machine 
resources? Again, I'm unsure whether all the points are essential to reach 
a result without larger and systematic errors. 

> 
> Then I tried 6.1, wondering how far I cold go and hoping that when I
> enconter problems, I'm more likely to be helped, ie. the bug would be
> fixed. Or that at least my experience and the bug report will be somehow
> appreciated. That was silly I see.

Not silly, but still a difference between what you expected the 
combination of software and hardware to carry out and what other users and 
developers have seen as being their priority. 

> 
> >  Have you considered tiling your data - reading 
> > portions of your data and patching the resulting spline surfaces?
> 
> I would like to avoid it. Is it a good idea to mosaick DEM? Won't there
> be artifacts at the connetcions?

Yes, but are they larger (wide overlaps and average the values) than the 
errors already in the data? If yes, we are stuck, if no, there is a way 
forward, and recall that *.rst and other interpolators for this kind of 
data use a (very) small moving window over the data anyway, so tiling and 
patching is happening anyway. The key thing is the scale of the errors and 
whether they are systematic.

> 
> >  Once you 
> > have the surface, you can transfer it to GRASS 6, because as yet the 
> > raster storage data model is effectively unchanged. 
> > 
> > This is not a bug,
> 
> That's a very tollerant approach toward bug definition.
> 

A bug is when software does not do what it is designed to do, leaving 
quite a margin for interpretation in what users/developers think it is 
supposed to do. Things like deleting files when not asked to are bugs, as 
are overwriting objects in memory or seg-faults from freeing unallocated 
pointers. Those are more "objective", even though they can be very hard to 
find (valgrind helps), but differences in understanding of purpose are not 
bugs in my view.

> >  it is a mis-match of data models and intentions.
> 
> Do you mean that the fact Grass is not able to handle even 5 mln vector
> points (1 tenth of my whole possible dataset) is something normal?
> What's 5 mln points? 2236x2236 points, 10x500 km GPS tracks at 1m
> interval. Something a serious GIS vector model should handle perfect.
> 

Each of these data points is carrying information, so the question is how 
much you need to handle to deal with the problem - and this varies very 
much indeed. I have no idea which commercial GIS could interpolate your 
data or warp them perfectly, but as you've seen, perfect isn't a word I 
associate with the natural world, seems to fit virtual reality better! If 
your 5m grid data are really very accurate (and the accuracy is invariant 
across the whole area), then going by patches or warping are options in 
GRASS, but as you've demonstrated, neither interpolating in GRASS 5 (what 
did gdb say when s.surf.rst failed) nor handling the 50M points in GRASS 
6 in one mouthful seem to work. Any solution is going to be messy, even if 
s.surf.rst had run in GRASS 5, how should we know that the same tension 
parameter should be applied across the whole map? Is it at all feasible to 
divide the map up into zones of similar roughness (more rough meaning 
more careful choice of *.rst parameters)? 

> >  While 
> > accepting that freezing (meaning causing total OS failure, or rather 
> > occupation of all machine resources?
> 
> The latter.
> 
> >  - I don't think that a non-root user 
> > on a sensible OS can freeze the system so that a hard shutdown (pull 
> > power) is required) is unfortunate, it is usually caused by 100% CPU use 
> > and swapping caused by memory being fully occupied.
> 
> That was the case. like I described it - both 1GB ram and 1GB swap where
> used, total hang, even mouse pointer freezed, had to reset.

You can run GRASS from the command line without a GUI - even though 
response time may be very slow, having one terminal (say Ctrl-Alt-1) for 
you and GRASS, and another for you logged in with the correct process 
number for what you are running keyed in already to kill -9 should be 
accessible. Mice die, but CLI still runs, like Duracell rabbits. 

> 
> >  In well-written 
> > software, like GRASS 6 vector
> 
> That's your perception. From my point of view Grass vector model is not
> well written yet. Of course, one might say "send us a patch", and
> "limited man power". Unfortunatelly I'm not able to send a patch. All I
> can do for Grass is to test the code, report bugs, help in the
> bugtracker a bit, help other users when I have some time (it is cool to
> show off a little, isn't it). Which I do. Not as much as I would like
> to, but anyway, I do contribute a bit. And even if I didn't, I do
> deserve a decent reply.

I'm trying (both senses, I'm sure), but this is a serious problem (the 
error propagation is what needs controlling), so finding out how the 
errors behave for different approaches seems justified (and you have 
certainly thought of this, your earlier many constructive and positive 
contributions to the list show that you are serious about what you do). 
Please forgive the length of my reply, but I still think that you may be 
able to use the software as it is to get an acceptable result, but that 
doing it all at once is not the only option.

> 
> 
> > or R, say, there is a balance between how 
> > things are written, perceived needs, and user perceptions. The GRASS 6 
> > vector data model handles areas and lines pretty well, but because it is 
> > trying harder on these, is not well suited to XXL points data sets.
> 
> Again, is 5 mln points an XXL? If it was the raster engine in Grass to
> fail on 5 mln cells, would you still say it is XXL? Or that it was a
> serious bug in the raster engine?
> 

It is quite a lot of data, certainly far more that the people who wrote 
much of the software thought of handling. 

> 
> >  My 
> > understanding is that the authors of the *.rst programs themselves also 
> > use GRASS 5, among other things because its sites data model is very 
> > simple.
> 
> Like I said, I managed to transform to sites and reprojected them on an
> 8GB beast but s.surf.rst crashed anyway. Can't recall the error message,
> assumed it was pointless anyway since Grass 5 is no longer mantained
> AAMOF. But I could retry to reproduce the error if it would mnake sense
> (ie. if any chances it would be fixed in 5.4).
> 

Because the *.surf.rst programs are closely related, and are associated 
with published research, I think their authors are the best people to 
comment on this. I recall that you were in touch with them about faults in 
September.

Roger

> > 
> > Best wishes,
> 
> Thanks for your interest.
> 
> Best regards,
> Maciek
> 
> 
> --------------------
> W polskim Internecie s± setki milionów stron. My przekazujemy Tobie tylko najlepsze z nich!
> http://katalog.epf.pl/
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no