[GRASS-user] region issues using python multiprocessing.Pool()

Hamish hamish_b at yahoo.com
Wed Jul 10 16:38:36 PDT 2013


Eric wrote:

>>  If I run the script without multiprocessing enabled the script
>>  executes properly; If I run it with multiprocessing enabled,

just some 2c, I don't remember the exact details, but when we were
working on the multi-processing functions in core.py we went with
the subprocess python library instead of the multiprocess one as
it seemed to be a better fit for the collection of modular programs
used by GRASS in the usual way. Since pygrass is a different way of
thinking about the structure, maybe the lower-level multiprocess is
more appropriate to use with that (or not), I'm not sure.


>> Can the use_temp_region() function  be used as it is below to execute
>> multiple mapcalc jobs in parallel that have different extents?
>> Once I have this working I'd like to make  a more generic tool a-la
>> the i.landsat/toar tools.

see also the mapcalc_start() python command in lib/python/script/raster.py,
and start_command() in core.py, and how they are used by e.g. i.landsat.rgb.py,
i.pansharpen.py, i.oif.py, and r3.in.xyz.py.

(that uses grass.script, the best way to do it with pygrass may be different)


Pietro:
> I don't think that the function use_temp_region, can solve your problem,
> If you look at the code [0], the function set the environmental variable, 
> therefore in a multiprocessing environment you have several process that
> set the same variable at the same time...

the environment variables are isolated within a process as soon as the new OS
process is launched, it is impossible for a child process to share or export
environment variables back to its parent or among siblings. A process inherits
the environment as it was when it was launched, and any changes within that proc's
environment which happen during run-time evaporate with it when the process is
reaped. As long as WIND_OVERRIDE (ie use_temp_region()) is always used by modules
which want to change the region, and only g.region from the top-level is allowed
to change the real WIND file, then all should be ok.

since python doesn't really allow true multithreading (it is stuck on one
core thanks to GIL) it just spawns multiple processes to emulate it, and so
as long as there are multiple processes there will be multiple environments
for each. (again, I'm not a pygrass expert so please correct if my assumptions
are bad)


> myenv = os.environ.copy()
> # now change the variables in the dictionary
> myenv['GISRC'] = 'mygisrc'  # etc.
> run_command('r.something', bla, bla, env=myenv)

I would be very surprised if changing GISRC was the correct approach
for anything other than working on a large cluster with a job manager
and e.g. NFS issues to deal with.


regards,
Hamish



More information about the grass-user mailing list