[GRASS-user] region issues using python multiprocessing.Pool()

Eric Goddard egoddard1010 at gmail.com
Wed Jul 10 18:13:07 PDT 2013


Thanks for the responses, I'm looking forward to trying out your
suggestion, Pietro. Pygrass looks interesting, but I'm a little confused
about its relationship with grass.script.

One of the reasons I'm trying to use multiprocessing is because it side
steps the GIL issue by not using threads, according to the documentation (
http://docs.python.org/2/library/multiprocessing.html). I didn't think the
grass.start_command() would use all the available CPUs. I've used
multiprocessing with the gdal python api and it made use of all my cpu
cores.

Eric
Eric wrote:

>>  If I run the script without multiprocessing enabled the script
>>  executes properly; If I run it with multiprocessing enabled,

just some 2c, I don't remember the exact details, but when we were
working on the multi-processing functions in core.py we went with
the subprocess python library instead of the multiprocess one as
it seemed to be a better fit for the collection of modular programs
used by GRASS in the usual way. Since pygrass is a different way of
thinking about the structure, maybe the lower-level multiprocess is
more appropriate to use with that (or not), I'm not sure.


>> Can the use_temp_region() function  be used as it is below to execute
>> multiple mapcalc jobs in parallel that have different extents?
>> Once I have this working I'd like to make  a more generic tool a-la
>> the i.landsat/toar tools.

see also the mapcalc_start() python command in lib/python/script/raster.py,
and start_command() in core.py, and how they are used by e.g.
i.landsat.rgb.py,
i.pansharpen.py, i.oif.py, and r3.in.xyz.py.

(that uses grass.script, the best way to do it with pygrass may be
different)


Pietro:
> I don't think that the function use_temp_region, can solve your problem,
> If you look at the code [0], the function set the environmental variable,
> therefore in a multiprocessing environment you have several process that
> set the same variable at the same time...

the environment variables are isolated within a process as soon as the new
OS
process is launched, it is impossible for a child process to share or export
environment variables back to its parent or among siblings. A process
inherits
the environment as it was when it was launched, and any changes within that
proc's
environment which happen during run-time evaporate with it when the process
is
reaped. As long as WIND_OVERRIDE (ie use_temp_region()) is always used by
modules
which want to change the region, and only g.region from the top-level is
allowed
to change the real WIND file, then all should be ok.

since python doesn't really allow true multithreading (it is stuck on one
core thanks to GIL) it just spawns multiple processes to emulate it, and so
as long as there are multiple processes there will be multiple environments
for each. (again, I'm not a pygrass expert so please correct if my
assumptions
are bad)


> myenv = os.environ.copy()
> # now change the variables in the dictionary
> myenv['GISRC'] = 'mygisrc'  # etc.
> run_command('r.something', bla, bla, env=myenv)

I would be very surprised if changing GISRC was the correct approach
for anything other than working on a large cluster with a job manager
and e.g. NFS issues to deal with.


regards,
Hamish

_______________________________________________
grass-user mailing list
grass-user at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20130710/0c340fb6/attachment-0001.html>


More information about the grass-user mailing list