[GRASS-dev] v.surf.rst with really big files, Re: r.random

Helena Mitasova hmitaso at unity.ncsu.edu
Sat Apr 4 16:00:31 EDT 2009


Markus N.
cc to Markus M. and dev list,

that is a lot of points and a huge raster and I think it ran out of  
memory due to memory leak
and it is swapping. There are several problems that v.surf.rst has  
with large files
(we have same problem too I just did not have
time to look for the fix - I may ask Markus M for help because it is  
related to things
that he has been dealing with):

- one is the use of 32 bit file offset instead of off_t (see below,  
my expertise in these issues is zero)
- another one is highly probable memory leak that might have been  
introduced with
crossvalidation - although it manages to create a quadtree in memory
it keeps filling the memory as it runs the computation so it looks  
like cleanup after
each segment computation is missing, I have some notes about where it  
is but I don't know how difficult
it is to find it and fix it.

For now we run what I call "supersegmentation" we just divide the  
whole region
into overlapping strips, interpolate and patch together using a script.
But the above issues are in fact bugs and should be fixed - I had it  
on my list for summer
but I guess it is starting to be an obstacle for more people than me so
I may as well start looking into it now - any advice will be  
appreciated,

Helena

P.S. For removing the waves in a DEM - get a small subset where it is  
a really big
problem and test the procedure with r.random - I found it quite  
challenging to get rid
of the waves once they are introduced.
And make sure they are not real - I was in for
quite a surprise when flying to FOSS4G in Victoria over the mountains  
that we used in the
1996 erosion modeling paper and where I diligently removed the waves  
after the reviewer
suggested they are artificial - there was no vegetation and the  
hillslopes have literally steps,
I should have looked at some photos when working with those data.

------------------------------------------------------------------------ 
--------
The offending files for fseek problems are:
    output2d.c, resout2d.c, resout2dmod.c, write2d.c

The offending files where the wrong type is used for offsets are:
    interp2d.c, ressegm2d.c, segmen2d.c


GRASS 6.3.cvs (interp):~ > v.surf.rst -t in=OffShorePts4_xyz layer=0  
dmin=0.0003 elev=TestAll_z zmult=1.0 tension=1300000 smooth=1  
segmax=40 npmin=250 maskmap=InterpSouth_mask
Percent complete:
Reading lines from vector map ...
Reading nodes from vector map ...
WARNING: some points outside of region -- will ignore...
100%
100%
WARNING: strip exists with insufficient data

WARNING: there are points outside specified 2D/3D region--ignored 48760
          points
WARNING: ignoring 9075883 points -- too dense

The number of points from vector map is 10676718
The number of points outside of 2D/3D region 48760
The number of points being used is 1552075
Processing all selected output files
will require -268854816 bytes of disk space for temp files
dnorm = 0.034525, rescaled tension = 44.882578
Bitmap mask created
Percent complete:

WARNING: taking too long to find points for interpolation--please change
          the region to area where your points are. Continuing
          calculations...
Cannot fseek elev offset2=-1276198144

WARNING: taking too long to find points for interpolation--please change
          the region to area where your points are. Continuing
          calculations...
Cannot fseek elev offset2=-773086712

WARNING: taking too long to find points for interpolation--please change
          the region to area where your points are. Continuing
          calculations...
Cannot fseek elev offset2=-521417852
Cannot fseek elev offset2=-458453496
Cannot fseek elev offset2=-427071888
Cannot fseek elev offset2=-427070316

Helena,

Here are some more details regarding the 2GB file limit in the rst
code. The problem is that the current code uses 32-bit file offsets (int
and long) instead of 64-bit (off_t) offsets. The problems seem to be
limited to the src/libes/rst_gmsl/interp_float directory.  You can grep
for "offset" and "fseek" and see where the problem lines are.

The offending files for fseek problems are:
    output2d.c, resout2d.c, resout2dmod.c, write2d.c

The offending files where the wrong type is used for offsets are:
    interp2d.c, ressegm2d.c, segmen2d.c

It is my understanding that the off_t data type should be used for all
file offsets. If the following compiler flags are set, then off_t is a
64 bit offset, otherwise it is 32-bits

-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE

The off_t data type is declared in <unistd.h>

I am trying to rebuild at least interp2d.c, ressegm2d.c, and segmen2d.c
to use the off_t data type. I have added the line #include <unistd.h>,
but for some reason, the files do not recognize the off_t data type.
e.g.,

make[3]: Entering directory
`grass-5.4.0/src/libes/rst_gmsl/interp_float'

gcc -Igrass54-build/src/libes/rst_gmsl/interp_float
-Igrass-5.4.0/src/libes/rst_gmsl/interp_float -Igrass-5.4.0/src/include
-Igrass54-build/src/include -g -I../tree -I../data -g -O2
-D_FILE_OFFSET_BITS=64 -D_LARGE_FILE_SOURCE    -fPIC -c interp2d.c -o
grass54-build/src/libes/rst_gmsl/interp_float/interp2d.o
interp2d.c: In function `IL_grid_calc_2d':
interp2d.c:87: error: `off_t' undeclared (first use in this function)
interp2d.c:87: error: (Each undeclared identifier is reported only once
interp2d.c:87: error: for each function it appears in.)
interp2d.c:87: error: parse error before "offset"
interp2d.c:147: error: `offset' undeclared (first use in this function)
interp2d.c:279: error: `offset2' undeclared (first use in this function)


So my first problem is "How can I get the grass libraries in rst_gmsl to
recognize the off_t type?" I suspect that another problem will be that
fseek explicitly uses long data types for offsets. I am running on Linux
and can use the fseeko version in GNU libc 2.x which uses off_t, but
this makes the code non-portable. The real solution would be to convert
fseeks to lseeks, but lseek takes a integer file descriptor while fseek
uses FILE pointers.

Can you check with the GRASS devel list and see what is the right way to
resolve these problems?

-Andy


On Apr 4, 2009, at 1:27 PM, Markus Neteler wrote:

> Helena,
>
> an Italian colleague of mine (in cc) tries to reinterpolate a large  
> DEM
> to get rid of contour line artefacts. I suggested to her to
> random sample (r.random -b ...) and then v.surf.rst.
>
>
> random -b input=DTM_10m n=30% raster_output=DTM_random \
>               vector_output=rand_vect
>
> v.surf.rst with default parameters:
> Percentuale completa:
> Reading lines from vector map ...
> Reading nodes from vector map ...
> ATTENZIONE: strip exists with insufficient data
>
> The number of points from vector map is 27746241
> The number of points outside of 2D/3D region 0
> The number of points being used is 27746241
> Processing all selected output files
> will require -2119504096 bytes of disk space for temp files
>
> ATTENZIONE: taking too long to find points for interpolation-- 
> please change
>           the region to area where your points are. Continuing
>           calculations...
>
> -> 2GB barrier reached I guess.
>
> So, since it consumed too much RAM, I suggested to her to activate the
> segmentation. So she used:
>
> v.surf.rst input=rand_vect at sqlite layer=1 elev=elev_surf  
> slope=slope_surf
>   aspect=aspect_surf tension=10. segmax=25 npmin=120 dmin=10.000000
>   dmax=25.000000 zmult=1.0
>
> to not use too much memory (though she has 4GB installed).
>
> However, after 12 hours it did not even show 0%...
>
> Further details:
> GRASS 6.5.svn (Arno):~ > g.region -p
> projection: 1 (UTM)
> zone:       32
> datum:      wgs84
> ellipsoid:  wgs84
> north:      4905887.5471429
> south:      4756307.5471429
> west:       592474.77809005
> east:       768334.77809005
> nsres:      10
> ewres:      10
> rows:       14958
> cols:       17586
> cells:      263051388
>
>
> GRASS 6.5.svn (Arno):~ > uname -a
> Linux margherita 2.6.24-23-generic #1 SMP Thu Feb 5 14:30:38 UTC 2009
> x86_64 GNU/Linux
>
>
> Do you have a pointer for us?
>
> thanks
> Markus



More information about the grass-dev mailing list