[GRASS-user] r.watershed speed-up

Markus Metz markus_metz at gmx.de
Fri Aug 1 04:23:49 EDT 2008


Hello list,

there is now a new version of r.watershed.fast where results are even 
more similar to r.watershed. They are still not 100% identical to 
r.watershed, but I can't get it more similar. But it comes with a 
further speed increase. I repeated the test of Moritz with the same 
commands on GRASS 6.4.svn, results are below.

Moritz Lennert wrote:
>
> First test in North Carolina demo data set:
>
> g.region rast=elevation
>
> time r.watershed elevation=elevation at PERMANENT accumulation=old_accum 
> drainage=old_dir basin=old_sheds stream=old_streams thresh=500
>
> real    19m2.744s
> user    18m41.318s
> sys     0m1.884s
>
> time r.watershed.fast elevation=elevation at PERMANENT 
> accumulation=fast_accum drainage=fast_dir basin=fast_sheds 
> stream=fast_streams thresh=500
>
> real    0m18.034s
> user    0m17.833s
> sys     0m0.196s
Absolute times are not really comparable between systems, but relative 
differences in time should be similar. The following numbers are 
calculated with real time. In the test Moritz did, r.watershed took 63x 
as long as r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of 
the time of r.watershed.
New version: r.watershed took 127x as long as r.watershed.fast, i.e. 
r.watershed.fast needed only 0.8% of the time of r.watershed.
> Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 
> 98%. Those which have different directions are overwhelmingly low 
> slope cells.
New version: 2004480 cells, i.e. 99% of all cells show the same flow 
direction.
> 1833907 cells have the same accumulation value, i.e. 90%, but I guess 
> this is to be expected.
New version: 1921510 cells, i.e. 95% of all cells show the same 
accumulation value.

The idea is that a faster r.watershed can also be used for massive 
grids, where GRASS users frequently gave up using r.watershed because it 
would have taken hours or even days. I resampled "elevation" in the 
North Carolina demo data set from 10m to 3m with r.resamp.rst using 
default values (after the GRASS book Section 5.3.3, paragraph 
"Regularized spline with tension (RST) interpolation") to generate a 
fairly large map and ran the same test on the resampled map.

cells in region : 22,500,000

The results:

Speed:
r.watershed took 5459x as long as r.watershed.fast, i.e. 
r.watershed.fast needed only 0.02% of the time of r.watershed (here 
10h2m55s vs. 1m7s, 10 hours versus 1 minute...).

Flow direction differences:
22288539 cells, i.e. 99% of all cells show the same flow direction.

Flow accumulation differences:
20963653 cells, i.e. 93% of all cells show the same accumulation value.

Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
I don't understand why memory usage increases after <SECTION 1a: 
Initiating Memory> is completed.
Assuming that there is no longer a time constraint but only a memory 
constraint (although <SECTION 4: Watershed Determination> can take some 
time on large maps with a large threshold value), the upper region sizes 
that r.watershed.fast can process in RAM would be *roughly* for
1GB RAM:  14,000,000 cells
2GB RAM:  38,000,000 cells
4GB RAM:  86,000,000 cells
8GB RAM: 181,000,000 cells
after putting 400MB aside for the system and other open applications. 
Estimate based on Linux 64bit.

If you want to repeat and analyse the tests with the North Carolina demo 
data set, the new r.watershed.fast is here 
http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz 
and the test script is below.

Regards,

Markus


test script:
g.region rast=elevation
time r.watershed elevation=elevation at PERMANENT accumulation=nc_accum_old 
drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
time r.watershed.fast elevation=elevation at PERMANENT 
accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast 
stream=nc_streams_fast thresh=500
r.mapcalc nc_dir_dif='if(("nc_dir_old" - "nc_dir_fast" != 0),1,0)'
r.mapcalc nc_accum_dif='if(("nc_accum_old" - "nc_accum_fast" != 0),1,0)'
r.stats -c input=nc_dir_dif at PERMANENT
r.stats -c input=nc_accum_dif at PERMANENT

r.resamp.rst input=elevation at PERMANENT ew_res=3 ns_res=3 
elev=elevation_rst overlap=3 zmult=1.0 tension=40.
g.region rast=elevation_rst
time r.watershed elevation=elevation_rst at PERMANENT 
accumulation=nc_rst_accum_old drainage=nc_rst_dir_old 
basin=nc_rst_sheds_old stream=nc_rst_streams_old thresh=500
time r.watershed.fast elevation=elevation_rst at PERMANENT 
accumulation=nc_rst_accum_fast drainage=nc_rst_dir_fast 
basin=nc_rst_sheds_fast stream=nc_rst_streams_fast thresh=500
r.mapcalc nc_rst_dir_dif='if(("nc_rst_dir_old" - "nc_rst_dir_fast" != 
0),1,0)'
r.mapcalc nc_rst_accum_dif='if(("nc_rst_accum_old" - "nc_rst_accum_fast" 
!= 0),1,0)'
r.stats -c input=nc_rst_dir_dif at PERMANENT
r.stats -c input=nc_rst_accum_dif at PERMANENT



More information about the grass-user mailing list