[GRASS-dev] matplotlib example script

Sat Jul 26 02:30:10 EDT 2008

On Jul 25, 2008, at 6:56 PM, Glynn Clements wrote:

>
> Michael Barton wrote:
>
>>> ax.hist() and numpy.histogram() are broken by design. They should
>>> accept an iterator as an argument. Requiring the entire data to be
>>> passed as a list makes them useless for large amounts of data.
>>
>> Well. For *really* large amounts of data I suppose. And indeed
>> sometimes people have Gb maps to work with. However, 15 sec. for
>> histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As  
>> you
>> say, it would faster to use C modules in GRASS for binning if we had
>> them.

Your comments bring out a number of important points that we need to  
consider both in this particular case, and more in general too. Here  
are a few responses.

>>
>
> Things can be written if there is a need for them. But doing things
> "right" is more important than doing things "right now".

I agree. This is a proposal accompanied by examples so that others can  
try it out. Sort of like Cairo.

>
>
> Many of the current problems with GRASS are due to cases where
> expediency was allowed to win out over good design (or *any* design).
>
> [As to the specific task of binning, you could probably get a suitable
> tool within 5 minutes by scavenging from r.quantile.]

This would be good to add to r.stats, which already has an interface  
for this but which only works with float maps.

>
>
>>> Sure, we can provide "basic" functionality, but where do you draw  
>>> the
>>> line? Which features of a "real" statistics package *wouldn't* be
>>> useful in GRASS? I'm really quite worried about the potential for
>>> feature creep in this area.
>>
>> I guess I'd leave it up to the user/developer base to decide what
>> kinds of functionality we need. If it is something regularly
>> necessary, someone will probably craft a script for it. If this is
>> really useful, it might get translated into  C-code. Visualization  
>> and
>> spatial analysis is an important part of GIS functionality to me. In
>> fact, a lot of graphing programs and stat packages would have
>> difficulty in dealing with 20, 50, or 100 million points. MatPlotLib
>> (or something like it) seems like a good tool to have available for
>> development. It's also an encouragement for people to begin to  
>> develop
>> scripts in Python.
>
> It's also far too low-level. If you start writing lots of scripts
> which call matplotlib directly, you will quickly end up with a
> situation where one script lets you set the axis colour and the label
> font but not the symbols, another lets you control the symbols but not
> the font, etc.

It's not nearly as low level as d.graph or bash. Using pyplot or pylab  
dispenses with even more code. But it's important to see how this  
would work in a GRASS environment rather than just for creating  
general purpose graphs. This means it's up to the script/module/GUI  
developer to decide how much control is appropriate to pass on to users.

>
>
> [I'm using stylistic properties here for simplicity; there are
> probably other properties which are far more important, e.g. being
> able to zoom, select log/linear scaling, etc.
>
>
> If you can't find or aren't happy with existing graphing libraries,
> then write one. But don't require each script to include its own
> written-from-scratch-in-one-hour graphing library.

My thought was that this could potentially serve as a 'standard'  
graphic library for Python scripting in GRASS. Depending on the  
graphic requirements, it could also serve for some of the 'built in'  
graphing functions in a wxPython GUI environment--that is, things  
built in to the GUI or modules that ship with GRASS. Obviously, we'd  
need to try it out and maybe look at other possible alternatives. This  
library is widely used and well-maintained, and has a lot of  
functionality. So it seems like a good candidate for something like  
this.

>
>
> What is required is something at a level where the script generates
> some data then indicates that the data should be displayed as a
> scatter plot. The library should take care of the rest (scales, sizes,
> colours, ...)

I agree, especially for 'built-in' graphing like histogramming.

> without the script having to explicitly read options
> from the user and pass them to matplotlib.

The idea is not that we should attempt to create general-purpose  
graphing applications, but to provide a flexible set of tools that  
GRASS  developers and sophisticated users can use.

>
>
> Are you going to hard-code the fonts, colours, line width, symbols,
> etc, or allow the user to set them?

I guess it depends on the application. Sometimes this is superfluous  
and other times it's very useful.

> Assuming the latter, are you going
> to force them to specify font= linecolor= textcolor= etc for every
> command that they type, or allow them to set defaults? Assuming the
> latter, how will the script obtain this information?

The examples that I did gave the user virtually no choice over graph  
formatting or the type of graph displayed, although it is easy enough  
to build it into a script GUI. The existence of a library which allows  
a programmer to produce diverse, publication quality graphs doesn't  
mean that it is necessary to push all of the options onto users. This  
would be creating a general purpose graphing application. And as you  
point out, other applications can fill that role.

Something like MatPlotLib does give us a way to create an important  
class of visualization that is largely lacking in GRASS, which is  
otherwise rich in analytical tools. I realize that people vary in how  
they respond to information presented in different ways. I am someone  
who usually much prefers to see a graph than a table of numbers. So  
having something along the lines of MatPlotLib available seems like a  
good thing to me.

>
> Also, are these scripts going to be usable from within the GUI? (I.e.
> display the graphics in a window created by the GUI, not just dump a
> PNG file to the disk).

Currently some scripts are usable in the GUI simply because they are  
called from the menu. There is no particular integration beyond that.  
Some scripts (and C modules) produce dense numerical output that could  
benefit by optionally having it drawn to a graph in a file, rather  
than only being written to a text file. In other cases, there may be  
functions, like interactive profiling, that need to be wrapped into  
the GUI to be fully useable. The current interactive profiling module  
uses a wxPython library. This is fine, but is difficult to use outside  
of the wxPython GUI. Because there is considerable demand (at least  
among the developer community) for maintaining the possibility inside  
and outside the GUI, I've been looking for something that would work  
in both environments. That is one reason I like MatPlotLib, although  
there may be better alternatives I'm not yet aware of. It has a  
wxPython backend that allows it to be wrapped completely in the GUI  
and display to a canvas. Or it can create its own display environment.  
Or it can output to a file. And it is as easy to insert into a stand- 
along script as it is to embed it into the wxPython GUI environment.

> How are the file format, dimensions, filename
> etc communicated between the two?
>
>
> Will the script be able to communicate the set of available display
> attributes to the GUI (in such a way that it can distinguish "what to
> display" from "how to display it")?

Currently no independent, stand-alone modules communicate with the GUI  
display. Jachym and Martin have created some hooks to make it possible  
to send the output from a display command (e.g. d.rast) to the  
wxPython canvas. I'm not sure if this is still active and I'm not sure  
that it should be a high priority, though I know that others might  
disagree.

Since MatPlotLib is pure Python and has a wxPython backend, it  
shouldn't be too hard to be able to have a graph created by this  
library display in the mapdisplay canvas. But I'm not sure that is a  
good idea. The mapdisplay canvas, with its toolbar, is pretty  
specialized for displaying maps and map-like imagery. Usually, I'd  
think a user would prefer to have a graph display in a separate window  
and a different kind of window. This is easy enough to do with  
MatPlotLib in a wxPython (or other) environment.

If this seems after testing to be a potentially valuable addition to  
the GRASS system, it would probably be good to build some standardized  
convenience libraries to easily create graphs with a standard look,  
manage data flows from grass modules to MatPlotLib/Numpy, create a  
standard graph display window (the toolbar that comes with MatPlotLib  
is only partly useful IMHO), etc. Perhaps this is what you meant  
above. It's also the kind of examples I did--axis labels, title,  
scaling are all automatic.

>
> This, in a nutshell, is the difference between "software engineering"
> and "coding". I've worked with too many "coders"; and by "worked
> with", I mean "cleaned up after".

I agree very much. I have far less experience in this than you, but  
over the past several years, I've had to sort through a lot of poorly  
designed, accretionary code. On the other hand, almost none of us on  
the development team are programmers and development has been a long- 
term accretionary process. But I suppose that is all the more reason  
to try to work out the concepts better.

>
> That isn't to say that you shouldn't start to write anything until you
> have a complete architectural design. Just that you should expect the
> first dozen or so attempts to serve as learning exercises rather than
> something which will eventually be used. And expect the first few
> attempts to tell you more about what *won't* work than what will.
>
> In many regards, trial-and-error can produce a better design than
> trying to operate from a purely theoretical perspective. Hindsight
> tends to be more accurate than foresight, albeit with the drawback
> that it takes rather more effort to obtain.

And some parts of GRASS very much need more in the way of well-though- 
out design concepts, while others can be more evolutionary. In my  
mind, that is where we are with a graphing library at the moment. I,  
at least, think it would be very good to have one. Someone could  
certainly program one in C, but that seems a lot of work if we can  
just use one that is already built. That leaves us more resources to  
build the pieces that are NOT out there to use. I'm simply proposing  
MatPlotLib as a potential candidate to use for creating graphs in a  
Python-enabled GRASS. We need to work with it some to see what its  
potential and limitations are.

Michael