[GRASS-dev] Parallelize a job using multiprocess python library without destroying environmental variable

Annalisa Minelli annagrass6 at gmail.com
Tue Jul 1 05:05:33 PDT 2014


Thanks to both,
I will have a look at your advices/ideas and tell you if I can solve!

All the best,
Annalisa


2014-06-30 20:17 GMT+02:00 Javier Martínez-López <
javi.martinez.lopez at gmail.com>:

> Hi Annalisa,
>
> I still need to learn a lot about this and have not tested Vaclav's
> advice yet, which is probably the best way to go, but you can take a
> look at some scripts I wrote for doing this:
>
>
> https://github.com/javimarlop/eHabpy/blob/master/pas/tmp/parallel_segmentation_pca.py
>
>
> https://github.com/javimarlop/eHabpy/blob/master/pas/parallel_grass_example.py
>
> They are working for me, but as Markus Metz also mentioned me once, if
> you are not using a cluster and there is a lot of writing/reading from
> the same hard disk, you will probably not speed up considerably the
> processing. In any case, I am also very interested in further
> developing this script, so any ideas are welcome!
>
> Cheers,
>
> Javier
>
>
> On Mon, Jun 30, 2014 at 4:05 PM, Vaclav Petras <wenzeslaus at gmail.com>
> wrote:
> >
> >
> >
> > On Mon, Jun 30, 2014 at 5:21 AM, Annalisa Minelli <annagrass6 at gmail.com>
> > wrote:
> >>
> >> Hi all,
> >> I'm attempting to parallelize a job in a python script using
> multiprocess
> >> library in grass70.
> >> I had a look at the following links:
> >> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs
> >> and http://grasswiki.osgeo.org/wiki/Parallelizing_Scripts.
> >>
> >> I would like to work in the same location but in different mapsets
> because
> >> my jobs touch the region settings, but I don't know how to set separate
> >> mapset for separate jobs.
> >>
> >> Since now I discovered that this processes, if run in the same mapset,
> >> clean all the environmental variables (GISDBASE, LOCATION, MAPSET) so
> then
> >> GRASS does not start anymore and I have to restore the .grass70/rc
> file..
> >>
> >> can anyone hint me on how to set different mapsets for different jobs?
> >>
> >
> > First, look at the PyGRASS GridModule [1] whether this can help you.
> >
> > For general case, there is unfortunately no API. From what I understand,
> you
> > have to create a file "gisrc" somewhere and then do something like env =
> > copy(os.environ) and change GISRC there to your custom "gisrc". Then you
> the
> > change the mapset and region by standard GRASS means but you must pass
> `env`
> > parameter to all command/module calls (env is used by Python subprocess
> to
> > set environment just for one process).
> >
> > Note that GISRC, GISBASE and LOCATION are (system) environmental
> variables
> > while GISDBASE, LOCATION_NAME and MAPSET are GRASS GIS
> session/environment
> > variables and are stored in "gisrc" file. I don't have an idea what
> LOCATION
> > variable is for (it contains full path to the mapset).
> >
> > I would be glad to hear what others think about this.
> >
> > You can of course read source code of GridModule, rendering in wxGUI,
> > g.gui.animation, or the following snipped but I don't say that it will be
> > easy to understand and there might be a lot of imperfections.
> >
> > Vaclav
> >
> >     # we rely on the tmp dir having enough space for our map
> >     tgt_gisdbase = tempfile.mkdtemp()
> >     # this is not needed if we use mkdtemp but why not
> >     tgt_location = 'r.out.png.proj_location_%s' % epsg_code
> >     # because we are using PERMANENT we don't have to create mapset
> > explicitly
> >     tgt_mapset_name = 'PERMANENT'
> >
> >     src_mapset = Mapset(src_mapset_name)
> >
> >     # get source (old) and set target (new) GISRC enviromental variable
> >     # TODO: set environ only for child processes could be enough and it
> > would
> >     # enable (?) parallel runs
> >     src_gisrc = os.environ['GISRC']
> >     tgt_gisrc = gsetup.write_gisrc(tgt_gisdbase,
> >                                    tgt_location, tgt_mapset_name)
> >     # we should use a copy and pass it but then it would not be possible
> to
> > use create_location
> >     os.environ['GISRC'] = tgt_gisrc
> >     if os.environ.get('WIND_OVERRIDE'):
> >         old_temp_region = os.environ['WIND_OVERRIDE']
> >         del os.environ['WIND_OVERRIDE']
> >     else:
> >         old_temp_region = None
> >     # these lines looks good but anyway when developing the module
> >     # switching location seemed fragile and on some errors (while running
> >     # unfinished module) location was switched in the command line
> >
> >     try:
> >         # the function itself is not safe for other (backgroud) processes
> >         # (e.g. GUI), however we already switched GISRC for us
> >         # and child processes, so we don't influece others
> >         gcore.create_location(dbase=tgt_gisdbase,
> >                               location=tgt_location,
> >                               epsg=epsg_code,
> >                               datum=None,
> >                               datum_trans=None)
> >
> >         # Mapset object cannot be created if the real mapset does not
> exists
> >         tgt_mapset = Mapset(gisdbase=tgt_gisdbase, location=tgt_location,
> >                             mapset=tgt_mapset_name)
> >         # set the current mapset in the library
> >         # we actually don't need to switch when only calling modules
> >         # (right GISRC is enough for them)
> >         tgt_mapset.current()
> > ...
> >
> >
> >
> > [1] http://grass.osgeo.org/grass71/manuals/pygrass/modules_grid.html
> >
> >
> >>
> >> All the best,
> >> Annalisa
> >>
> >> _______________________________________________
> >> grass-dev mailing list
> >> grass-dev at lists.osgeo.org
> >> http://lists.osgeo.org/mailman/listinfo/grass-dev
> >
> >
> >
> > _______________________________________________
> > grass-dev mailing list
> > grass-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/grass-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20140701/405b02e6/attachment.html>


More information about the grass-dev mailing list