comparing r.cost and r.terracost [was: [GRASS-dev] Re: grass-dev Digest, Vol 43, Issue 8]

Wed Nov 18 19:30:42 EST 2009

Hi Markus,

Your  conclusions are based on the hypothesis that you can model the  
performance  of r.cost  in the presence of low memory  by tweaking the  
memory limit in the code and using a machine with a large physical  
memory.   I don't think that this hypothesis is true, and here is the  
evidence so far:

r.cost  on a machine with 8GB physical memory:    1h
r.cost  on a machine with 2.5GB physical memory:  4h

If you reboot the machine with 1GB RAM,  you will see the running time  
go up (by a lot).  Afterwards, try rebooting with 512GB RAM.  I have  
run similar tests in the past,  and r.cost did not finish in 40  
hours.  It may be better now, and you are the best person to re-try  
these tests as you know how to tweak it.

I'll get back about what terracost is doing and why it has such large  
files after we see these new numbers.

-Laura

On Nov 17, 2009, at 3:51 PM, Markus Metz wrote:

> Hi Laura,
>
> Laura Toma wrote:
>> Hi Markus,
>>
>> Processing a grid of 312 M cells takes about 8 x 312M = 2GB of RAM,
> That is only true for r.terracost with numtiles=1, because r.terracost
> stores costs as float. Is it possible that there is a bug in  
> r.terracost
> when using numtiles > 1, because creating 64GB of temporary files  
> seems
> a bit inordinate for 2GB of data? And if r.terracost would use double
> for costs, it would be about 130GB of temporary files? OK, disk  
> space is
> nearly for free nowadays.
> r.cost stores costs as double, so the size of temporary files is about
> 4GB. Additionally, 2GB where used for processing, i.e. at least 6GB of
> system RAM are required to also keep cached files in RAM.
>> so on a machine with 8GB of RAM it will not use virtual memory at  
>> all,
>> irrespective of how you tweak it.
> Right, but it still uses the disk IO algorithm and reads from/writes  
> to
> disk.
>>
>> With 8GB of RAM, the correct comparison is between r.cost and
>> r.terracost with numtiles=1
> I don't think so because r.cost still uses its disk IO algorithm while
> r.terracost doesn't. That's like comparing r.watershed in ram mode  
> with
> r.terraflow. A module not using a disk IO algorithm should always be
> faster than a corresponding module using a disk IO algorithm, as  
> long as
> intermediate data fit into RAM.
>>
>> In other words, if you tweak r.cost, you also need to tweak
>> r.terracost, which means you run with numtiles=1 for as long as data
>> fits in real memory.
> I tweaked the disk IO algorithm to be faster, not to use less disk
> space. I can also do serious tweaking and write a true all-in-memory
> version of r.cost and compare that to r.terracost numtiles=1, but I'm
> interested in the performance of r.cost with the disk IO algorithm and
> thus compare it to r.terracost with its disk IO algorithm (requires
> numtiles > 1).
>>
>> If you want any real numbers on how r.cost behaves with low memory  
>> you
>> need to reboot the machine with 1GB or better 512MB of RAM. There is
>> no way around it. Just try it, it is easy to do. I run experiments
>> like this all the time.
> OK, would you mind running experiments with r.cost in grass7 and
> r.terracost numtiles>1 so you can see for yourself?
>
> I rebooted with 2500MB of RAM in order to run the same test command as
> before on the 312 million cells region, giving about 2000MB of RAM to
> r.cost, same like before. I used the same region and start points as
> before because I think these settings are challenging for r.cost. My
> test system went into swap space, all memory was used up (system file
> cache was in swap anyway, OS needs some RAM too), and r.cost took, as
> expected, longer, namely 4 hours 10 min.
>
> Still much less than the 24 hours 22 min of r.terracost with  
> memory=2000
> and 8GB of system RAM...
>
> The latest version of r.cost (r39749) needs 2 hours 30 min with 2500MB
> of RAM and 2000MB of RAM assigned to it, remainder used by OS.
>
> From a user's perspective, one reason or side-effect concerning  
> modules
> with disk IO algorithms is IMO that you do not need to use up all
> available system memory and can still do other things in parallel,  
> so I
> would always assign max 75% percent of RAM to these modules and can
> still do other work, potentially preventing the system from caching  
> files.
>
> BTW, there was a typo in my g.region command, must be res=30 in  
> order to
> get 312 million cells, sorry!
>
> Markus M
>>
>>
>> -Laura
>>
>>
>> On Nov 14, 2009, at 6:51 AM, Markus Metz wrote:
>>
>>> Hi Laura,
>>>
>>> Laura Toma wrote:
>>>>
>>>> my experience is that , if you want to see how an application would
>>>> behave with 500 MB of RAM, you have to physically reboot the  
>>>> machine
>>>> with 500 MB of RAM (it's very easy to do this on a Mac, and  
>>>> relatively
>>>> easy on Linux. on windows, i don't know).
>>>>
>>>> if the machine has more than 500MB RAM, even if you restrict the
>>>> application to use less, the system gives it all it can. in your
>>>> setup, it is almost as if r.cost would run fully in memory, because
>>>> even it it places the segments on disk, the system file cache  
>>>> fits all
>>>> segments in memory. the same is true for terracost, its streams  
>>>> fit in
>>>> memory. but using tiles has a big CPU overhead, which is why it is
>>>> slower.
>>> I haven't rebooted my Linux box with less RAM, but I set up a test
>>> region with about 312 million cells (details below), I think we can
>>> agree that this is for current standards a pretty large region,  
>>> maybe
>>> not in the future. Your argument still holds true that r.cost may  
>>> have
>>> some advantage because its temp files are much smaller than the temp
>>> files of r.terracost and therefore a larger proportion can be  
>>> cached by
>>> the system (beyond the control of the module). I could however see  
>>> a lot
>>> of disk IO on both modules.
>>>
>>> For 312 million cells, r.cost needed 51 min, r.terracost needed 24  
>>> h 22
>>> min, both got 2GB memory.
>>>
>>> Now that sounds like really bad news for r.terracost. But this is  
>>> not
>>> the whole truth. First, I had to tweak r.cost a little bit in  
>>> order to
>>> be so fast, still have to come up with a solution to do that  
>>> tweaking in
>>> the module. Second, r.cost may suffer more from memory reduction,  
>>> not
>>> physical RAM reduction, than r.terracost. Reducing the  
>>> percent_memory
>>> option already slows the module down considerably. But that is  
>>> also true
>>> for r.terracost, there the bottleneck seems to be INTERTILE DIJKSTRA
>>> which took well over 12 hours with heavy disk IO and full memory
>>> consumption. Third, r.cost performs better with less start points
>>> keeping region settings constant. I'm not sure if this applies as  
>>> well
>>> to r.terracost.
>>>
>>> In summary, I think that on even larger regions, say >1 billion  
>>> cells,
>>> and many small separate start points (>100 000), r.terracost should
>>> outperform r.cost, but I would not bet on it ;-) For what I guess is
>>> current everyday use (< 100 million cells), r.cost in grass7 might  
>>> most
>>> of the time outperform r.terracost with numtiles>1, sometimes
>>> considerably as in my tests. Speed performance of r.cost is  
>>> variable and
>>> dependent on the combination of region size, number and  
>>> distribution of
>>> start points, and the amount of memory it is allowed to use. There  
>>> may
>>> still be some scope for improvement in r.cost, I just did a quick  
>>> job
>>> there, no in-depth code analysis (yet). The extraordinarily large  
>>> temp
>>> files of r.terracost (total 64GB, largest single file was about  
>>> 56GB, no
>>> typo) could be a handicap when processing such large regions.  
>>> Finally,
>>> the results of the tests I did are valid for my test system only,  
>>> they
>>> will be different on other systems.
>>>
>>>>
>>>> when i did some preliminary testing, i rebooted the machine with  
>>>> 512MB
>>>> RAM, and ran r.cost on grids of 50M-100M cells. it was slow,
>>>> completely IO bound, and took several hours or more. or if you  
>>>> use 1GB
>>>> of RAM, you may need to go to larger grids.
>>> Please test r.cost in grass7 yourself, and maybe share your test
>>> commands, then others can run the tests too and compare.
>>>
>>> Here is my test region:
>>>
>>> The 312 million cells test region was created in the North Carolina
>>> sample dataset with
>>> g.region rast=elev_state_500m at PERMANENT res=40
>>> Then I created a cost layer with
>>> r.mapcalc "cost = 1"
>>> You wanted many start points, so I generated 10000 start points with
>>> v.random output=start_points_10000 n=10000
>>> and converted this vector to raster with
>>> v.to.rast start_points_10000 use=val val=1 out=start_points_10000  
>>> --o
>>>
>>> The test command for r.cost was
>>> time r.cost input=cost start_rast=start_points_10000
>>> output=dist_random_10000 percent_memory=40 --o
>>> This setting was equivalent to 2 GB of memory.
>>> time:
>>> real 51m18.172s
>>> user 34m4.067s
>>> sys 0m45.100s
>>>
>>> For r.terracost, I used as temp dir again a directory on a  
>>> separate hard
>>> drive, faster than the one that r.cost used, so let's say
>>> tmpdir="/path/to/some/fast/dir"
>>> and the test command for r.terracost was
>>> time r.terracost in=cost start_rast=start_points_10000
>>> out=dist_random_10000_terracost STEAM_DIR=$tmpdir VTMPDIR=$tmpdir
>>> memory=2000 numtiles=20788 --o
>>> numtiles=20788 I got with r.terracost -i
>>> time:
>>> real 1453m37.022s
>>> user 513m56.549s
>>> sys 43m38.519s
>>>
>>> Sorry for that long post!
>>>
>>> Markus M
>>>
>>
>>