[GRASS-user] "Parallelization" of Python Script

Johannes Radinger johannesradinger at gmail.com
Tue Aug 7 04:33:48 PDT 2012


Hi Daniel,

Let's say you have three different tasks that need to be performed for each
> map. This is just a short example, you'd need to adapt it further, but you
> could do something like this:
>
> running_jobs = 0
>
> # Loop over jobs
> for i in range(jobs):
>     # Check what position you're on in your array of jobs
>     position = i % 3
>     # Do a job accordingly
>     if position == 0:
>         do_something()
>     elif position == 1:
>         do_something_else()
>     elif position == 2:
>         and_yet_something_else()
>     # Increase count of running jobs
>     running_jobs += 1
>     # If you don't have any available workers, wait until they're all done
>     if running_jobs % workers is 0:
>         for j in range(workers):
>             wait_for_jobs_to_finish()
>         # Now reset the running jobs to 0 so that you can continue to add
> jobs to your queue
>         running_jobs = 0
>
>
In your loop what do you mean with "job" respectivle the number of jobs. Is
that e.g. 3 task x 200 maps = 600 jobs?

Attached there is a little python script which applies a focal filter on a
raster map but based on a user supplied r.cost
distance from each cell instead of a moving window. Thus the script loops
over each single raster cell (transfered to a
temporary point vector) and calculates r. cost, mapcalc and r.univar which
takes quite a lot of time. The script is working
so far and uses g.parser to generate the gui interface etc.

I think I can upload this skript to the addons/wiki but before I'd like to
make it faster (e.g. parallize the code).

Thank you for your help so far...

/Johannes





> HTH!
>
> Daniel
>
> --
>
> B.Sc. Daniel Lee
> Geschäftsführung für Forschung und Entwicklung
> ISIS - International Solar Information Solutions GbR
> Vertreten durch: Daniel Lee, Nepomuk Reinhard und Nils Räder
>
> Softwarecenter 3
> 35037 Marburg
> Festnetz: +49 6421 379 6256
> Mobil: +49 176 6127 7269
> E-Mail: Lee at isi-solutions.org
> Web: http://www.isi-solutions.org
>
>
>
>
> 2012/8/7 Johannes Radinger <johannesradinger at gmail.com>
>
>> Hi,
>>
>> sounds promising, but somehow I don't get it (as I not yet deeply into
>> python scripts :()
>> E.g if I try to perform a second step using the output of the first step
>> (r.slope.aspect) as
>> input in the next one (e.g. r.cost)  ... how would you do that. I
>> understand in you example that
>> you use the modulus operator to query if the last job of each "group" is
>> started...then the wait
>> is used to finish all jobs of that group before going to the next line.
>> If I understand you correctly
>> I just need to insert that if-line between every consequetive computation
>> step?
>>
>> # Loop over jobs
>> for i in range(jobs):
>>     # Insert job into dictinoary to keep track of it
>>     proc[i] = grass.start_command('r.slope.aspect',
>>                                   elevation='elev_' + str(i),
>>                                   slope='slope_' + str(i))
>>
>>     # Probably here I have to wait until slope_i is created??
>>     if i % workers is 0:
>>         for j in range(workers):
>>             proc.[i - j].wait()
>>     # How would you do that? In my case these are a dozend
>>     # consequetive grass commands (output1=input2...)
>>     # and some thousand "jobs" which I want to loop over
>>
>>     proc[i] = grass.start_command('r.cost',
>>                                   input='slope_' + str(i),
>>                                   output='costraster_' + str(i),
>>                                   coordinate = "123,123")
>>     if i % workers is 0:
>>         for j in range(workers):
>>             proc.[i - j].wait()
>>
>> # Make sure all workers are finished.
>> for i in range(jobs):
>>     if proc[i].wait() is not 0:
>>         grass.fatal(_('Problem running analysis on evel_' + str(i) + '.')
>>
>>
>> /johannes
>>
>>
>>
>> On Tue, Aug 7, 2012 at 10:21 AM, Daniel Lee <lee at isi-solutions.org>wrote:
>>
>>> Hi Johannes,
>>>
>>> Certainly it's possible, it'd just be a question of how you make your
>>> loop. Inside the loop of your jobs you could evaluate whether it's an odd
>>> or even job and check if the previous job has been finished, then start the
>>> next one. I'm not sure what your code looks like, but I've condensed
>>> Hamish's script for my own reference into an example gist so that I can
>>> remember some of the cool tricks. I don't know if it would help you to look
>>> at it since you're already seen the original script, but if you want to
>>> take a look feel free to:
>>>
>>> https://gist.github.com/3282580
>>>
>>> I think you'll basically have to work with the modulo operator a bit. If
>>> you use that, you can possibly reduce a lot of nested for loops into a for
>>> loop with an if-elif-else query.
>>>
>>>
>>> Best,
>>> Daniel
>>>
>>> --
>>>
>>> B.Sc. Daniel Lee
>>> Geschäftsführung für Forschung und Entwicklung
>>> ISIS - International Solar Information Solutions GbR
>>> Vertreten durch: Daniel Lee, Nepomuk Reinhard und Nils Räder
>>>
>>> Softwarecenter 3
>>> 35037 Marburg
>>> Festnetz: +49 6421 379 6256
>>> Mobil: +49 176 6127 7269
>>> E-Mail: Lee at isi-solutions.org
>>> Web: http://www.isi-solutions.org
>>>
>>>
>>>
>>>
>>> 2012/8/7 Johannes Radinger <johannesradinger at gmail.com>
>>>
>>>> Hi,
>>>>
>>>> for an example of grass.start_command() for parallelizing a bunch
>>>>> of r.cost runs, see v.surf.icw(.py) in grass7 addons:
>>>>>
>>>>>
>>>>> https://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.surf.icw/v.surf.icw.py
>>>>>
>>>>> thank you for that example. I think it explains it very well how it
>>>> works to assign
>>>> multiple r.cost runs to single processes with grass.start_command.
>>>> I am just wondering how it is done when there are multiple consecutive
>>>> processes
>>>> in the for loop. In your example (v.surf.icw.py) for each step (e.g.
>>>> r.cost (line 271), r.mapcalc (298))
>>>> an separate for loop is started...Is there a way to combine the steps
>>>> etc. in a function (e.g. combination
>>>> of r.cost and mapcalc) and launch that function in a way like
>>>> grass.start_command in a single loop?
>>>> If possible that would probably save code lines and might be a little
>>>> more clear (at least to me).
>>>> I am just asking because one of my skripts which is still in "serial
>>>> mode" involves lots of steps inside the for loop.
>>>> This would create in parallel at least a dozen for loops which might
>>>> appear very unclear.
>>>>
>>>> Anyway I think the parallelization can really save computation time of
>>>> my skripts... :)
>>>>
>>>> /Johannes
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20120807/a5e6ebce/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: distance_filter2.py
Type: application/octet-stream
Size: 4482 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20120807/a5e6ebce/attachment-0001.obj>


More information about the grass-user mailing list