[GRASS-dev] matplotlib example script

Thu Jul 24 22:01:32 EDT 2008

On Jul 24, 2008, at 6:03 PM, Glynn Clements wrote:

>
> Michael Barton wrote:
>
>>>> This script replicates the functionality of
>>>> d.histogram, but makes a much nicer looking plot. It also has more
>>>> options, although I've included only a very few of the formatting
>>>> possibilities here. The script sends the output from r.stats into 9
>>>> lines of code to produce the histogram with formatting options.  
>>>> Most
>>>> of it is done by 4 lines. This can be run from the command line and
>>>> does not require a GUI to be installed. It writes to a PNG file,  
>>>> but
>>>> could easily be set to create a TclTk or wxPython interactive  
>>>> window
>>>> as a user-selectable option.
>>>
>>> Before we start using MatPlotLib extensively, we need to figure out
>>> how to handle output format selection so that all script can work  
>>> with
>>> any backend, and the scripts can be used alongside d.* commands
>>> without the caller needing to distingiush between them.
>>>
>>> Or, at least, we need to figure out an interface that will allow  
>>> us to
>>> do that later without having to modify large numbers of scripts.  
>>> IOW,
>>> somthing like a grass.plot module which hides the details behind
>>> begin() and end() methods, so the scripts themselves don't reference
>>> specific format types.
>>
>> I agree. One possibility is to use the ps backend if you are indeed
>> planning to switch the output to postscript. In this example, I just
>> wanted to show that TclTk, wxPython, Qt or other GUI is not required.
>
> Well, the general idea is to support as wide a range of backends as
> possible. Essentially, we want something similar to the d.* commands,
> which will output to a variety of backends, even those which are yet
> to be invented.
>
> Also, if we can't find any better way to integrate d.* commands and
> matplotlib, we may end up adding a d.graph backend to matplotlib.
>
>>>>           for i in range(cells):
>>>>               plotlist.append(val)
>>>
>>> Ouch. This is going to cause problems (i.e. fail) for large maps  
>>> (even
>>> for moderately-sized maps, it will be slow). As it stands, this  
>>> script
>>> *isn't* a substitute for d.histogram.
>>
>> Well, I was worried about that too. However, even with the 10m DEM in
>> spearfish, the main time is spent in running r.stats. The rest of the
>> code takes no appreciable time to run. I should try it on a big  
>> Terra/
>> ASTER or Landsat ETM file and see how long it takes.
>
> Here:
>
> $ g.region rast=elevation.10m
> $ time r.stats -Acn elevation.10m >/dev/null
> r.stats complete.
>
> real	0m1.233s
> user	0m1.220s
> sys	0m0.013s
> time stuff/histogram_mpldemo.py input=elevation.10m
> r.stats complete.
>
> real	0m5.058s
> user	0m4.886s
> sys	0m0.163s
>
> Note that the time taken to get to the "r.stats complete" point
> includes the above loop to insert "cells" copies of "val" in
> "plotlist".
>
> Also, r.stats alone is only using one core, while the script will be
> using both (one running the script, another running r.stats
> concurrently). With a single core (or if the other core(s) are busy),
> it would be worse.
>
>>> As axes.hist() just calls numpy.histogram() then uses axes.bar() (or
>>> axes.barh()) to plot the result, you may as well just use axes.bar()
>>> directly on the value,count pairs emitted by r.stats.
>>>
>>> If you specifically want to bin the data, it would be better to  
>>> either
>>> add an option to r.stats, or to bin the data in the script as it is
>>> being read, rather than read an entire raster map into a Python  
>>> list.
>>
>> I agree about r.stats. Wouldn't binning in the script take as long as
>> the current method?
>
> Even if it takes just as long, you're less likely to have it fail
> because "plotlist" consumed all available RAM. As it stands, plotlist
> will have one entry for every non-null cell in the raster.
>
> When processing "bulk" data, anything that uses a fixed amount of
> memory (e.g. one integer per bin) is preferable to using memory
> proportional to the size of the input.
>
> Hence the recommendation to iterate over the lines of r.stats' output
> rather than read it all into a list then iterate over the list.

I'll give that a try.

>
>
>>>>       if len(fcolor.split(':')) == 3:
>>>>           #rgb color
>>>>           r = float(fcolor.split(':')[0])
>>>>           g = float(fcolor.split(':')[1])
>>>>           b = float(fcolor.split(':')[2])
>>>>           #translate to mpl rgb format
>>>>           r = r / 255
>>>>           g = g / 255
>>>>           b = b / 255
>>>>           fcolor = (r,g,b)
>>>
>>> This should probably be made into a library function; it would also
>>> need the named colours (lib/gis/named_colr.c).
>>
>> Yes. And it might be one in MatPlotLib. It has some color function,
>> but I haven't explored them yet. Do the grass color names have html
>> equivalents? If so, we don't need /lib/gis/named_colr.c.
>
> I suspect that we want our own code here. Eventually there may be
> Python scripts which don't use matplotlib (i.e. non-d.* scripts) but
> which still need to parse colours.
>
> I've added this to grass.py:
>
> 	def parse_color(val, dflt = None):
> 	    if val in named_colors:
> 	        return named_colors[val]
> 	
> 	    vals = val.split(':')
> 	    if len(vals) == 3:
> 	        return tuple(float(v) / 255 for v in vals)
> 	
> 	    return dflt
>
> [along with the named_colors dictionary.]

I'd added it as a method in the script, but can dispense with that now.

Michael