[GRASS-dev] matplotlib example script

Fri Jul 25 21:56:01 EDT 2008

Michael Barton wrote:

> > ax.hist() and numpy.histogram() are broken by design. They should
> > accept an iterator as an argument. Requiring the entire data to be
> > passed as a list makes them useless for large amounts of data.
> 
> Well. For *really* large amounts of data I suppose. And indeed  
> sometimes people have Gb maps to work with. However, 15 sec. for  
> histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you  
> say, it would faster to use C modules in GRASS for binning if we had  
> them.

Things can be written if there is a need for them. But doing things
"right" is more important than doing things "right now".

Many of the current problems with GRASS are due to cases where
expediency was allowed to win out over good design (or *any* design).

[As to the specific task of binning, you could probably get a suitable
tool within 5 minutes by scavenging from r.quantile.]

> > Sure, we can provide "basic" functionality, but where do you draw the
> > line? Which features of a "real" statistics package *wouldn't* be
> > useful in GRASS? I'm really quite worried about the potential for
> > feature creep in this area.
> 
> I guess I'd leave it up to the user/developer base to decide what  
> kinds of functionality we need. If it is something regularly  
> necessary, someone will probably craft a script for it. If this is  
> really useful, it might get translated into  C-code. Visualization and  
> spatial analysis is an important part of GIS functionality to me. In  
> fact, a lot of graphing programs and stat packages would have  
> difficulty in dealing with 20, 50, or 100 million points. MatPlotLib  
> (or something like it) seems like a good tool to have available for  
> development. It's also an encouragement for people to begin to develop  
> scripts in Python.

It's also far too low-level. If you start writing lots of scripts
which call matplotlib directly, you will quickly end up with a
situation where one script lets you set the axis colour and the label
font but not the symbols, another lets you control the symbols but not
the font, etc.

[I'm using stylistic properties here for simplicity; there are
probably other properties which are far more important, e.g. being
able to zoom, select log/linear scaling, etc.]

If you can't find or aren't happy with existing graphing libraries,
then write one. But don't require each script to include its own
written-from-scratch-in-one-hour graphing library.

What is required is something at a level where the script generates
some data then indicates that the data should be displayed as a
scatter plot. The library should take care of the rest (scales, sizes,
colours, ...) without the script having to explicitly read options
from the user and pass them to matplotlib.

Are you going to hard-code the fonts, colours, line width, symbols,
etc, or allow the user to set them? Assuming the latter, are you going
to force them to specify font= linecolor= textcolor= etc for every
command that they type, or allow them to set defaults? Assuming the
latter, how will the script obtain this information?

Also, are these scripts going to be usable from within the GUI? (I.e. 
display the graphics in a window created by the GUI, not just dump a
PNG file to the disk). How are the file format, dimensions, filename
etc communicated between the two?

Will the script be able to communicate the set of available display
attributes to the GUI (in such a way that it can distinguish "what to
display" from "how to display it")?

This, in a nutshell, is the difference between "software engineering"
and "coding". I've worked with too many "coders"; and by "worked
with", I mean "cleaned up after".

That isn't to say that you shouldn't start to write anything until you
have a complete architectural design. Just that you should expect the
first dozen or so attempts to serve as learning exercises rather than
something which will eventually be used. And expect the first few
attempts to tell you more about what *won't* work than what will.

In many regards, trial-and-error can produce a better design than
trying to operate from a purely theoretical perspective. Hindsight
tends to be more accurate than foresight, albeit with the drawback
that it takes rather more effort to obtain.

-- 
Glynn Clements <glynn at gclements.plus.com>