comparing r.cost and r.terracost [was: [GRASS-dev] Re: grass-dev Digest, Vol 43, Issue 8]

Markus Metz markus.metz.giswork at googlemail.com
Sat Nov 14 06:51:42 EST 2009


Hi Laura,

Laura Toma wrote:
>
> my experience is that , if you want to see how an application would 
> behave with 500 MB of RAM, you have to physically reboot the machine 
> with 500 MB of RAM (it's very easy to do this on a Mac, and relatively 
> easy on Linux. on windows, i don't know).
>
> if the machine has more than 500MB RAM, even if you restrict the 
> application to use less, the system gives it all it can. in your 
> setup, it is almost as if r.cost would run fully in memory, because 
> even it it places the segments on disk, the system file cache fits all 
> segments in memory. the same is true for terracost, its streams fit in 
> memory. but using tiles has a big CPU overhead, which is why it is 
> slower.
I haven't rebooted my Linux box with less RAM, but I set up a test 
region with about 312 million cells (details below), I think we can 
agree that this is for current standards a pretty large region, maybe 
not in the future. Your argument still holds true that r.cost may have 
some advantage because its temp files are much smaller than the temp 
files of r.terracost and therefore a larger proportion can be cached by 
the system (beyond the control of the module). I could however see a lot 
of disk IO on both modules.

For 312 million cells, r.cost needed 51 min, r.terracost needed 24 h 22 
min, both got 2GB memory.

Now that sounds like really bad news for r.terracost. But this is not 
the whole truth. First, I had to tweak r.cost a little bit in order to 
be so fast, still have to come up with a solution to do that tweaking in 
the module. Second, r.cost may suffer more from memory reduction, not 
physical RAM reduction, than r.terracost. Reducing the percent_memory 
option already slows the module down considerably. But that is also true 
for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA 
which took well over 12 hours with heavy disk IO and full memory 
consumption. Third, r.cost performs better with less start points 
keeping region settings constant. I'm not sure if this applies as well 
to r.terracost.

In summary, I think that on even larger regions, say >1 billion cells, 
and many small separate start points (>100 000), r.terracost should 
outperform r.cost, but I would not bet on it ;-) For what I guess is 
current everyday use (< 100 million cells), r.cost in grass7 might most 
of the time outperform r.terracost with numtiles>1, sometimes 
considerably as in my tests. Speed performance of r.cost is variable and 
dependent on the combination of region size, number and distribution of 
start points, and the amount of memory it is allowed to use. There may 
still be some scope for improvement in r.cost, I just did a quick job 
there, no in-depth code analysis (yet). The extraordinarily large temp 
files of r.terracost (total 64GB, largest single file was about 56GB, no 
typo) could be a handicap when processing such large regions. Finally, 
the results of the tests I did are valid for my test system only, they 
will be different on other systems.

>
> when i did some preliminary testing, i rebooted the machine with 512MB 
> RAM, and ran r.cost on grids of 50M-100M cells. it was slow, 
> completely IO bound, and took several hours or more. or if you use 1GB 
> of RAM, you may need to go to larger grids.
Please test r.cost in grass7 yourself, and maybe share your test 
commands, then others can run the tests too and compare.

Here is my test region:

The 312 million cells test region was created in the North Carolina 
sample dataset with
g.region rast=elev_state_500m at PERMANENT res=40
Then I created a cost layer with
r.mapcalc "cost = 1"
You wanted many start points, so I generated 10000 start points with
v.random output=start_points_10000 n=10000
and converted this vector to raster with
v.to.rast start_points_10000 use=val val=1 out=start_points_10000 --o

The test command for r.cost was
time r.cost input=cost start_rast=start_points_10000 
output=dist_random_10000 percent_memory=40 --o
This setting was equivalent to 2 GB of memory.
time:
real 51m18.172s
user 34m4.067s
sys 0m45.100s

For r.terracost, I used as temp dir again a directory on a separate hard 
drive, faster than the one that r.cost used, so let's say
tmpdir="/path/to/some/fast/dir"
and the test command for r.terracost was
time r.terracost in=cost start_rast=start_points_10000 
out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir 
memory=2000 numtiles=20788 --o
numtiles=20788 I got with r.terracost -i
time:
real 1453m37.022s
user 513m56.549s
sys 43m38.519s

Sorry for that long post!

Markus M



More information about the grass-dev mailing list