[GRASS-dev] v.surf.rst with really big files, Re: r.random
Helena Mitasova
hmitaso at unity.ncsu.edu
Sat Apr 4 16:00:31 EDT 2009
Markus N.
cc to Markus M. and dev list,
that is a lot of points and a huge raster and I think it ran out of
memory due to memory leak
and it is swapping. There are several problems that v.surf.rst has
with large files
(we have same problem too I just did not have
time to look for the fix - I may ask Markus M for help because it is
related to things
that he has been dealing with):
- one is the use of 32 bit file offset instead of off_t (see below,
my expertise in these issues is zero)
- another one is highly probable memory leak that might have been
introduced with
crossvalidation - although it manages to create a quadtree in memory
it keeps filling the memory as it runs the computation so it looks
like cleanup after
each segment computation is missing, I have some notes about where it
is but I don't know how difficult
it is to find it and fix it.
For now we run what I call "supersegmentation" we just divide the
whole region
into overlapping strips, interpolate and patch together using a script.
But the above issues are in fact bugs and should be fixed - I had it
on my list for summer
but I guess it is starting to be an obstacle for more people than me so
I may as well start looking into it now - any advice will be
appreciated,
Helena
P.S. For removing the waves in a DEM - get a small subset where it is
a really big
problem and test the procedure with r.random - I found it quite
challenging to get rid
of the waves once they are introduced.
And make sure they are not real - I was in for
quite a surprise when flying to FOSS4G in Victoria over the mountains
that we used in the
1996 erosion modeling paper and where I diligently removed the waves
after the reviewer
suggested they are artificial - there was no vegetation and the
hillslopes have literally steps,
I should have looked at some photos when working with those data.
------------------------------------------------------------------------
--------
The offending files for fseek problems are:
output2d.c, resout2d.c, resout2dmod.c, write2d.c
The offending files where the wrong type is used for offsets are:
interp2d.c, ressegm2d.c, segmen2d.c
GRASS 6.3.cvs (interp):~ > v.surf.rst -t in=OffShorePts4_xyz layer=0
dmin=0.0003 elev=TestAll_z zmult=1.0 tension=1300000 smooth=1
segmax=40 npmin=250 maskmap=InterpSouth_mask
Percent complete:
Reading lines from vector map ...
Reading nodes from vector map ...
WARNING: some points outside of region -- will ignore...
100%
100%
WARNING: strip exists with insufficient data
WARNING: there are points outside specified 2D/3D region--ignored 48760
points
WARNING: ignoring 9075883 points -- too dense
The number of points from vector map is 10676718
The number of points outside of 2D/3D region 48760
The number of points being used is 1552075
Processing all selected output files
will require -268854816 bytes of disk space for temp files
dnorm = 0.034525, rescaled tension = 44.882578
Bitmap mask created
Percent complete:
WARNING: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...
Cannot fseek elev offset2=-1276198144
WARNING: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...
Cannot fseek elev offset2=-773086712
WARNING: taking too long to find points for interpolation--please change
the region to area where your points are. Continuing
calculations...
Cannot fseek elev offset2=-521417852
Cannot fseek elev offset2=-458453496
Cannot fseek elev offset2=-427071888
Cannot fseek elev offset2=-427070316
Helena,
Here are some more details regarding the 2GB file limit in the rst
code. The problem is that the current code uses 32-bit file offsets (int
and long) instead of 64-bit (off_t) offsets. The problems seem to be
limited to the src/libes/rst_gmsl/interp_float directory. You can grep
for "offset" and "fseek" and see where the problem lines are.
The offending files for fseek problems are:
output2d.c, resout2d.c, resout2dmod.c, write2d.c
The offending files where the wrong type is used for offsets are:
interp2d.c, ressegm2d.c, segmen2d.c
It is my understanding that the off_t data type should be used for all
file offsets. If the following compiler flags are set, then off_t is a
64 bit offset, otherwise it is 32-bits
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
The off_t data type is declared in <unistd.h>
I am trying to rebuild at least interp2d.c, ressegm2d.c, and segmen2d.c
to use the off_t data type. I have added the line #include <unistd.h>,
but for some reason, the files do not recognize the off_t data type.
e.g.,
make[3]: Entering directory
`grass-5.4.0/src/libes/rst_gmsl/interp_float'
gcc -Igrass54-build/src/libes/rst_gmsl/interp_float
-Igrass-5.4.0/src/libes/rst_gmsl/interp_float -Igrass-5.4.0/src/include
-Igrass54-build/src/include -g -I../tree -I../data -g -O2
-D_FILE_OFFSET_BITS=64 -D_LARGE_FILE_SOURCE -fPIC -c interp2d.c -o
grass54-build/src/libes/rst_gmsl/interp_float/interp2d.o
interp2d.c: In function `IL_grid_calc_2d':
interp2d.c:87: error: `off_t' undeclared (first use in this function)
interp2d.c:87: error: (Each undeclared identifier is reported only once
interp2d.c:87: error: for each function it appears in.)
interp2d.c:87: error: parse error before "offset"
interp2d.c:147: error: `offset' undeclared (first use in this function)
interp2d.c:279: error: `offset2' undeclared (first use in this function)
So my first problem is "How can I get the grass libraries in rst_gmsl to
recognize the off_t type?" I suspect that another problem will be that
fseek explicitly uses long data types for offsets. I am running on Linux
and can use the fseeko version in GNU libc 2.x which uses off_t, but
this makes the code non-portable. The real solution would be to convert
fseeks to lseeks, but lseek takes a integer file descriptor while fseek
uses FILE pointers.
Can you check with the GRASS devel list and see what is the right way to
resolve these problems?
-Andy
On Apr 4, 2009, at 1:27 PM, Markus Neteler wrote:
> Helena,
>
> an Italian colleague of mine (in cc) tries to reinterpolate a large
> DEM
> to get rid of contour line artefacts. I suggested to her to
> random sample (r.random -b ...) and then v.surf.rst.
>
>
> random -b input=DTM_10m n=30% raster_output=DTM_random \
> vector_output=rand_vect
>
> v.surf.rst with default parameters:
> Percentuale completa:
> Reading lines from vector map ...
> Reading nodes from vector map ...
> ATTENZIONE: strip exists with insufficient data
>
> The number of points from vector map is 27746241
> The number of points outside of 2D/3D region 0
> The number of points being used is 27746241
> Processing all selected output files
> will require -2119504096 bytes of disk space for temp files
>
> ATTENZIONE: taking too long to find points for interpolation--
> please change
> the region to area where your points are. Continuing
> calculations...
>
> -> 2GB barrier reached I guess.
>
> So, since it consumed too much RAM, I suggested to her to activate the
> segmentation. So she used:
>
> v.surf.rst input=rand_vect at sqlite layer=1 elev=elev_surf
> slope=slope_surf
> aspect=aspect_surf tension=10. segmax=25 npmin=120 dmin=10.000000
> dmax=25.000000 zmult=1.0
>
> to not use too much memory (though she has 4GB installed).
>
> However, after 12 hours it did not even show 0%...
>
> Further details:
> GRASS 6.5.svn (Arno):~ > g.region -p
> projection: 1 (UTM)
> zone: 32
> datum: wgs84
> ellipsoid: wgs84
> north: 4905887.5471429
> south: 4756307.5471429
> west: 592474.77809005
> east: 768334.77809005
> nsres: 10
> ewres: 10
> rows: 14958
> cols: 17586
> cells: 263051388
>
>
> GRASS 6.5.svn (Arno):~ > uname -a
> Linux margherita 2.6.24-23-generic #1 SMP Thu Feb 5 14:30:38 UTC 2009
> x86_64 GNU/Linux
>
>
> Do you have a pointer for us?
>
> thanks
> Markus
More information about the grass-dev
mailing list