[GRASS-user] "Parallelization" of Python Script

Hamish hamish_b at yahoo.com
Tue Aug 7 01:37:10 PDT 2012


Hamish wrote:
> for an example of grass.start_command() for parallelizing a bunch
> of r.cost runs, see v.surf.icw(.py) in grass7 addons:
> https://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.surf.icw/v.surf.icw.py

Johannes:
> thank you for that example. I think it explains it very well how it
> works to assign multiple r.cost runs to single processes with
> grass.start_command. I am just wondering how it is done when there are
> multiple consecutive processes
>
> in the for loop. In your example (v.surf.icw.py) for each step (e.g.
> r.cost (line 271), r.mapcalc (298)) an separate for loop is started...Is
> there a way to combine the steps etc. in a function (e.g. combination
> of r.cost and mapcalc) and launch that function in a way like 
> grass.start_command in a single loop?
> If possible that would probably save code lines and might be a little
> more clear (at least to me).
>
> I am just asking because one of my skripts which is still in "serial
> mode" involves lots of steps inside the for loop.
>
> This would create in parallel at least a dozen for loops which might
> appear very unclear.

ok, in s.surf.icw(.sh) for GRASS 5 and v.surf.icw(.sh) for GRASS 6 I had
it as one big loop, but for the GRASS 7 python version I made it into
a series of small loops to (a) use the simpler grass_start() single command
method, and (b) get rid of the temp maps ASAP since that module makes a
lot of them and it adds a lot of disk I/O lag if they get flushed to
the hard drive before they are removed. In the icw case most of the time
was taken by r.cost compared to the renaming and preprocessing bits of
the (former) big loop.


for parallelizing an entire function in Python as you want, there's a
method in grass7's i.landsat.rgb(.py) to look at that uses mp.Process.
It's a bit more work since you have to manually ensure that the I/O pipes
get closed.
  https://trac.osgeo.org/grass/browser/grass/trunk/scripts/i.landsat.rgb/i.landsat.rgb.py

note the above script preserves the serial execution method intact (to
make the imagery method easier to learn), so has ~ double the code than it
actually needs.  But I think using the extra wrapper function makes the
real guts of the imagery algorithm easier to read, understand, and maintain,
and so keeping all the ugly parallelization stuff away is a good thing.


Hamish


More information about the grass-user mailing list