[GRASS-dev] grass.mapcalc() python wrapper using grass.start_command()

Hamish hamish_b at yahoo.com
Mon Jun 25 17:25:51 PDT 2012


> Hamish wrote:
> > I just added an experimental grass.mapcalc_start() fn
> to trunk/lib/python/raster.py, any comments or better ideas
> how to do it?
Glynn: 
> The function looks okay, although using Python threads may
> be a better approach in general.

why so?  for mapcalc_start() or start_command()?
There seem to be a proliferation of python MP options right now,
it'll take some guessing as to what will remain after the field
matures.

n.b. a goal is to keep the end-user side of things as simple as
possible: mapcalc_start() + p.wait() is pretty clean to use/teach.

At the other end is i.landsat.rgb.py which multiprocessing's
the python function instead of a specific command. With all the
extra message passing code needed in the end-user script side of
things it is getting a bit more complicated than I'd like to
have outside of a library.

(same getting-too-complicated issue exists for devbr6's copy of
i.landsat.rgb.sh, which resorts to tricks such as
  eval `echo "MAX=\"\\$\${i}_\$BRI_VAR_STR\""`
which works well, but is nearing levels of gobbledy gook
readability and wtf abstraction usually only seen in perl or awk.
I worry that it becomes unmaintainable and not very instructive
to the casual reader..)


> Beyond that, bear in mind that running concurrent processes
> improves latency at the expense of efficiency. Overall
> performance will typically be improved if you're using cores
> which would otherwise be idle, but be reduced if the system
> is under load. So I wouldn't recommend forcing scripts to use
> concurrency.

ok, always give the user the choice, fair enough. The trick of
course is how to code that without duplicating the code all over
the place, but not making it too tricky to maintain either.


another discussion topic that comes out of this is: to run in
parallel or serially by default? for r3.in.xyz.py I've set it
to default to workers=1, but for i.landsat.rgb.py I've set it
to run all three bands at once by default. There's no right
answer, but it would be good to present a consistent approach.
For a 'value no greater than what you have' approach (to become
a GRASS py lib fn which returns an int) perhaps something like:
 http://stackoverflow.com/questions/1006289/how-to-find-out-the-number-of-cpus-in-python


also, another consistency issue it would be good to discuss
before we go much further:  I've been respecting/borrowing the
namespace of the "WORKERS" enviro variable in various places if
it was set. As having three enviro vars: WORKERS, GRASS_WORKERS,
and OMP_NUM_THREADS seems needlessly redundant.


Hamish


More information about the grass-dev mailing list