[GRASS-dev] Re: parallelizing GRASS modules

Mon Dec 5 09:05:34 EST 2011

On Mon, Dec 5, 2011 at 11:30 AM, Hamish <hamish_b at yahoo.com> wrote:
> Hamish:
>> > wrt r.walk: I would think to start with parallelizing r.cost, then
>> > porting the method over to r.walk. Both modules use the segment
>> > library, so coding it to process each segment in its own thread
>> > seems like the way to do that.
>
> [I have since studied r.cost and the segment library more closely, and
> now realize that the segment library is a custom page swap virtual array
> to keep RAM low; as opposed to a more straight forward quad-tree-like,
> rows=4096, or 'r.in.xyz percent=' style divide and conquer method]
>
> Markus M:
>> The segment library in trunk is mostly IO bound, CPU overhead is very
>> low. Parallelizing the segment library could actually slow down the
>> system because several parts of a file are accessed in parallel that
>> can not be cached by the system. That's why the iostream lib does
>> strictly sequential reading. The segment library, although designed to
>> allow random access, should be used such that access is not too random
>
> fwiw, I ran trunk's r.cost through valgrind+kcachgrind profiling:
>  (see grass mediawiki Bugs page for method)
> 50% of the module time was taken in segment_get()
> 25% in segment_address()
> 18% in segment_address_fast()
> 16% in get_lowest()
> 8% in segment_pagein()
> 7% in memcpy()
>

What exactly did you want to profile? If you want to profile rather
the parts responsible for loading the input map(s) and writing the
output map(s) than the actual search part, you should use a cost
surface with lots of NULLs, otherwise a cost surface without NULLs.
The current region should at least have a few million cells, otherwise
I doubt that the results are meaningful. Separating disk IO (time
spent by read and write) from CPU load is here also important.

r.cost as other modules uses the segment library because the input
raster can not be processed row by row, any part of the input and
output raster must be accessible at any time while processing. One
possibility is to load everything to memory, but that's not so
user-friendly and against the fundamentals of GRASS coding having its
origin in the 70's and 80's. That is, the more crucial performance
tests should use region settings that would exceed the available
memory (see info printed at the beginning of r.cost --v) if everything
is loaded to memory. IOW, there may be other modules that are easier
to optimize using parallelization than the ones using the segment
library.

Attachment FYI

Markus M
-------------- next part --------------
A non-text attachment was scrubbed...
Name: r.cost.png
Type: image/png
Size: 32909 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/grass-dev/attachments/20111205/f0976094/r.cost-0001.png