[GRASS-dev] matplotlib example script

Thu Jul 24 21:03:42 EDT 2008

Michael Barton wrote:

> >> This script replicates the functionality of
> >> d.histogram, but makes a much nicer looking plot. It also has more
> >> options, although I've included only a very few of the formatting
> >> possibilities here. The script sends the output from r.stats into 9
> >> lines of code to produce the histogram with formatting options. Most
> >> of it is done by 4 lines. This can be run from the command line and
> >> does not require a GUI to be installed. It writes to a PNG file, but
> >> could easily be set to create a TclTk or wxPython interactive window
> >> as a user-selectable option.
> >
> > Before we start using MatPlotLib extensively, we need to figure out
> > how to handle output format selection so that all script can work with
> > any backend, and the scripts can be used alongside d.* commands
> > without the caller needing to distingiush between them.
> >
> > Or, at least, we need to figure out an interface that will allow us to
> > do that later without having to modify large numbers of scripts. IOW,
> > somthing like a grass.plot module which hides the details behind
> > begin() and end() methods, so the scripts themselves don't reference
> > specific format types.
> 
> I agree. One possibility is to use the ps backend if you are indeed  
> planning to switch the output to postscript. In this example, I just  
> wanted to show that TclTk, wxPython, Qt or other GUI is not required.

Well, the general idea is to support as wide a range of backends as
possible. Essentially, we want something similar to the d.* commands,
which will output to a variety of backends, even those which are yet
to be invented.

Also, if we can't find any better way to integrate d.* commands and
matplotlib, we may end up adding a d.graph backend to matplotlib.

> >>            for i in range(cells):
> >>                plotlist.append(val)
> >
> > Ouch. This is going to cause problems (i.e. fail) for large maps (even
> > for moderately-sized maps, it will be slow). As it stands, this script
> > *isn't* a substitute for d.histogram.
> 
> Well, I was worried about that too. However, even with the 10m DEM in  
> spearfish, the main time is spent in running r.stats. The rest of the  
> code takes no appreciable time to run. I should try it on a big Terra/ 
> ASTER or Landsat ETM file and see how long it takes.

Here:

$ g.region rast=elevation.10m
$ time r.stats -Acn elevation.10m >/dev/null
r.stats complete.

real	0m1.233s
user	0m1.220s
sys	0m0.013s
time stuff/histogram_mpldemo.py input=elevation.10m
r.stats complete.

real	0m5.058s
user	0m4.886s
sys	0m0.163s

Note that the time taken to get to the "r.stats complete" point
includes the above loop to insert "cells" copies of "val" in
"plotlist".

Also, r.stats alone is only using one core, while the script will be
using both (one running the script, another running r.stats
concurrently). With a single core (or if the other core(s) are busy),
it would be worse.

> > As axes.hist() just calls numpy.histogram() then uses axes.bar() (or
> > axes.barh()) to plot the result, you may as well just use axes.bar()
> > directly on the value,count pairs emitted by r.stats.
> >
> > If you specifically want to bin the data, it would be better to either
> > add an option to r.stats, or to bin the data in the script as it is
> > being read, rather than read an entire raster map into a Python list.
> 
> I agree about r.stats. Wouldn't binning in the script take as long as  
> the current method?

Even if it takes just as long, you're less likely to have it fail
because "plotlist" consumed all available RAM. As it stands, plotlist
will have one entry for every non-null cell in the raster.

When processing "bulk" data, anything that uses a fixed amount of
memory (e.g. one integer per bin) is preferable to using memory
proportional to the size of the input.

Hence the recommendation to iterate over the lines of r.stats' output
rather than read it all into a list then iterate over the list.

> >>        if len(fcolor.split(':')) == 3:
> >>            #rgb color
> >>            r = float(fcolor.split(':')[0])
> >>            g = float(fcolor.split(':')[1])
> >>            b = float(fcolor.split(':')[2])
> >>            #translate to mpl rgb format
> >>            r = r / 255
> >>            g = g / 255
> >>            b = b / 255
> >>            fcolor = (r,g,b)
> >
> > This should probably be made into a library function; it would also
> > need the named colours (lib/gis/named_colr.c).
> 
> Yes. And it might be one in MatPlotLib. It has some color function,  
> but I haven't explored them yet. Do the grass color names have html  
> equivalents? If so, we don't need /lib/gis/named_colr.c.

I suspect that we want our own code here. Eventually there may be
Python scripts which don't use matplotlib (i.e. non-d.* scripts) but
which still need to parse colours.

I've added this to grass.py:

	def parse_color(val, dflt = None):
	    if val in named_colors:
	        return named_colors[val]

	    vals = val.split(':')
	    if len(vals) == 3:
	        return tuple(float(v) / 255 for v in vals)

	    return dflt

[along with the named_colors dictionary.]

-- 
Glynn Clements <glynn at gclements.plus.com>