[GRASS-user] [GRASS-dev] Parallelize a job using multiprocess python library without destroying environmental variable

Javier Martínez-López javi.martinez.lopez at gmail.com
Mon Jun 30 11:17:05 PDT 2014


Hi Annalisa,

I still need to learn a lot about this and have not tested Vaclav's
advice yet, which is probably the best way to go, but you can take a
look at some scripts I wrote for doing this:

https://github.com/javimarlop/eHabpy/blob/master/pas/tmp/parallel_segmentation_pca.py

https://github.com/javimarlop/eHabpy/blob/master/pas/parallel_grass_example.py

They are working for me, but as Markus Metz also mentioned me once, if
you are not using a cluster and there is a lot of writing/reading from
the same hard disk, you will probably not speed up considerably the
processing. In any case, I am also very interested in further
developing this script, so any ideas are welcome!

Cheers,

Javier


On Mon, Jun 30, 2014 at 4:05 PM, Vaclav Petras <wenzeslaus at gmail.com> wrote:
>
>
>
> On Mon, Jun 30, 2014 at 5:21 AM, Annalisa Minelli <annagrass6 at gmail.com>
> wrote:
>>
>> Hi all,
>> I'm attempting to parallelize a job in a python script using multiprocess
>> library in grass70.
>> I had a look at the following links:
>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs
>> and http://grasswiki.osgeo.org/wiki/Parallelizing_Scripts.
>>
>> I would like to work in the same location but in different mapsets because
>> my jobs touch the region settings, but I don't know how to set separate
>> mapset for separate jobs.
>>
>> Since now I discovered that this processes, if run in the same mapset,
>> clean all the environmental variables (GISDBASE, LOCATION, MAPSET) so then
>> GRASS does not start anymore and I have to restore the .grass70/rc file..
>>
>> can anyone hint me on how to set different mapsets for different jobs?
>>
>
> First, look at the PyGRASS GridModule [1] whether this can help you.
>
> For general case, there is unfortunately no API. From what I understand, you
> have to create a file "gisrc" somewhere and then do something like env =
> copy(os.environ) and change GISRC there to your custom "gisrc". Then you the
> change the mapset and region by standard GRASS means but you must pass `env`
> parameter to all command/module calls (env is used by Python subprocess to
> set environment just for one process).
>
> Note that GISRC, GISBASE and LOCATION are (system) environmental variables
> while GISDBASE, LOCATION_NAME and MAPSET are GRASS GIS session/environment
> variables and are stored in "gisrc" file. I don't have an idea what LOCATION
> variable is for (it contains full path to the mapset).
>
> I would be glad to hear what others think about this.
>
> You can of course read source code of GridModule, rendering in wxGUI,
> g.gui.animation, or the following snipped but I don't say that it will be
> easy to understand and there might be a lot of imperfections.
>
> Vaclav
>
>     # we rely on the tmp dir having enough space for our map
>     tgt_gisdbase = tempfile.mkdtemp()
>     # this is not needed if we use mkdtemp but why not
>     tgt_location = 'r.out.png.proj_location_%s' % epsg_code
>     # because we are using PERMANENT we don't have to create mapset
> explicitly
>     tgt_mapset_name = 'PERMANENT'
>
>     src_mapset = Mapset(src_mapset_name)
>
>     # get source (old) and set target (new) GISRC enviromental variable
>     # TODO: set environ only for child processes could be enough and it
> would
>     # enable (?) parallel runs
>     src_gisrc = os.environ['GISRC']
>     tgt_gisrc = gsetup.write_gisrc(tgt_gisdbase,
>                                    tgt_location, tgt_mapset_name)
>     # we should use a copy and pass it but then it would not be possible to
> use create_location
>     os.environ['GISRC'] = tgt_gisrc
>     if os.environ.get('WIND_OVERRIDE'):
>         old_temp_region = os.environ['WIND_OVERRIDE']
>         del os.environ['WIND_OVERRIDE']
>     else:
>         old_temp_region = None
>     # these lines looks good but anyway when developing the module
>     # switching location seemed fragile and on some errors (while running
>     # unfinished module) location was switched in the command line
>
>     try:
>         # the function itself is not safe for other (backgroud) processes
>         # (e.g. GUI), however we already switched GISRC for us
>         # and child processes, so we don't influece others
>         gcore.create_location(dbase=tgt_gisdbase,
>                               location=tgt_location,
>                               epsg=epsg_code,
>                               datum=None,
>                               datum_trans=None)
>
>         # Mapset object cannot be created if the real mapset does not exists
>         tgt_mapset = Mapset(gisdbase=tgt_gisdbase, location=tgt_location,
>                             mapset=tgt_mapset_name)
>         # set the current mapset in the library
>         # we actually don't need to switch when only calling modules
>         # (right GISRC is enough for them)
>         tgt_mapset.current()
> ...
>
>
>
> [1] http://grass.osgeo.org/grass71/manuals/pygrass/modules_grid.html
>
>
>>
>> All the best,
>> Annalisa
>>
>> _______________________________________________
>> grass-dev mailing list
>> grass-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/grass-dev
>
>
>
> _______________________________________________
> grass-dev mailing list
> grass-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev


More information about the grass-user mailing list