[GRASS-user] "Parallelization" of Python Script

Daniel Lee lee at isi-solutions.org
Tue Aug 7 02:35:48 PDT 2012


Hi Johannes,

Let's say you have three different tasks that need to be performed for each
map. This is just a short example, you'd need to adapt it further, but you
could do something like this:

running_jobs = 0
# Loop over jobs
for i in range(jobs):
    # Check what position you're on in your array of jobs
    position = i % 3
    # Do a job accordingly
    if position == 0:
        do_something()
    elif position == 1:
        do_something_else()
    elif position == 2:
        and_yet_something_else()
    # Increase count of running jobs
    running_jobs += 1
    # If you don't have any available workers, wait until they're all done
    if running_jobs % workers is 0:
        for j in range(workers):
            wait_for_jobs_to_finish()
        # Now reset the running jobs to 0 so that you can continue to add
jobs to your queue
        running_jobs = 0

HTH!
Daniel

--

B.Sc. Daniel Lee
Geschäftsführung für Forschung und Entwicklung
ISIS - International Solar Information Solutions GbR
Vertreten durch: Daniel Lee, Nepomuk Reinhard und Nils Räder

Softwarecenter 3
35037 Marburg
Festnetz: +49 6421 379 6256
Mobil: +49 176 6127 7269
E-Mail: Lee at isi-solutions.org
Web: http://www.isi-solutions.org




2012/8/7 Johannes Radinger <johannesradinger at gmail.com>

> Hi,
>
> sounds promising, but somehow I don't get it (as I not yet deeply into
> python scripts :()
> E.g if I try to perform a second step using the output of the first step
> (r.slope.aspect) as
> input in the next one (e.g. r.cost)  ... how would you do that. I
> understand in you example that
> you use the modulus operator to query if the last job of each "group" is
> started...then the wait
> is used to finish all jobs of that group before going to the next line. If
> I understand you correctly
> I just need to insert that if-line between every consequetive computation
> step?
>
> # Loop over jobs
> for i in range(jobs):
>     # Insert job into dictinoary to keep track of it
>     proc[i] = grass.start_command('r.slope.aspect',
>                                   elevation='elev_' + str(i),
>                                   slope='slope_' + str(i))
>
>     # Probably here I have to wait until slope_i is created??
>     if i % workers is 0:
>         for j in range(workers):
>             proc.[i - j].wait()
>     # How would you do that? In my case these are a dozend
>     # consequetive grass commands (output1=input2...)
>     # and some thousand "jobs" which I want to loop over
>
>     proc[i] = grass.start_command('r.cost',
>                                   input='slope_' + str(i),
>                                   output='costraster_' + str(i),
>                                   coordinate = "123,123")
>     if i % workers is 0:
>         for j in range(workers):
>             proc.[i - j].wait()
>
> # Make sure all workers are finished.
> for i in range(jobs):
>     if proc[i].wait() is not 0:
>         grass.fatal(_('Problem running analysis on evel_' + str(i) + '.')
>
>
> /johannes
>
>
>
> On Tue, Aug 7, 2012 at 10:21 AM, Daniel Lee <lee at isi-solutions.org> wrote:
>
>> Hi Johannes,
>>
>> Certainly it's possible, it'd just be a question of how you make your
>> loop. Inside the loop of your jobs you could evaluate whether it's an odd
>> or even job and check if the previous job has been finished, then start the
>> next one. I'm not sure what your code looks like, but I've condensed
>> Hamish's script for my own reference into an example gist so that I can
>> remember some of the cool tricks. I don't know if it would help you to look
>> at it since you're already seen the original script, but if you want to
>> take a look feel free to:
>>
>> https://gist.github.com/3282580
>>
>> I think you'll basically have to work with the modulo operator a bit. If
>> you use that, you can possibly reduce a lot of nested for loops into a for
>> loop with an if-elif-else query.
>>
>>
>> Best,
>> Daniel
>>
>> --
>>
>> B.Sc. Daniel Lee
>> Geschäftsführung für Forschung und Entwicklung
>> ISIS - International Solar Information Solutions GbR
>> Vertreten durch: Daniel Lee, Nepomuk Reinhard und Nils Räder
>>
>> Softwarecenter 3
>> 35037 Marburg
>> Festnetz: +49 6421 379 6256
>> Mobil: +49 176 6127 7269
>> E-Mail: Lee at isi-solutions.org
>> Web: http://www.isi-solutions.org
>>
>>
>>
>>
>> 2012/8/7 Johannes Radinger <johannesradinger at gmail.com>
>>
>>> Hi,
>>>
>>> for an example of grass.start_command() for parallelizing a bunch
>>>> of r.cost runs, see v.surf.icw(.py) in grass7 addons:
>>>>
>>>>
>>>> https://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.surf.icw/v.surf.icw.py
>>>>
>>>> thank you for that example. I think it explains it very well how it
>>> works to assign
>>> multiple r.cost runs to single processes with grass.start_command.
>>> I am just wondering how it is done when there are multiple consecutive
>>> processes
>>> in the for loop. In your example (v.surf.icw.py) for each step (e.g.
>>> r.cost (line 271), r.mapcalc (298))
>>> an separate for loop is started...Is there a way to combine the steps
>>> etc. in a function (e.g. combination
>>> of r.cost and mapcalc) and launch that function in a way like
>>> grass.start_command in a single loop?
>>> If possible that would probably save code lines and might be a little
>>> more clear (at least to me).
>>> I am just asking because one of my skripts which is still in "serial
>>> mode" involves lots of steps inside the for loop.
>>> This would create in parallel at least a dozen for loops which might
>>> appear very unclear.
>>>
>>> Anyway I think the parallelization can really save computation time of
>>> my skripts... :)
>>>
>>> /Johannes
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20120807/4bc1cb1a/attachment-0001.html>


More information about the grass-user mailing list