[GRASS-dev] matplotlib example script

Michael Barton michael.barton at asu.edu
Fri Jul 25 18:01:34 EDT 2008


On Jul 25, 2008, at 11:59 AM, Glynn Clements wrote:

>
> Michael Barton wrote:
>
>>> Even if it takes just as long, you're less likely to have it fail
>>> because "plotlist" consumed all available RAM. As it stands,  
>>> plotlist
>>> will have one entry for every non-null cell in the raster.
>>>
>>> When processing "bulk" data, anything that uses a fixed amount of
>>> memory (e.g. one integer per bin) is preferable to using memory
>>> proportional to the size of the input.
>>>
>>> Hence the recommendation to iterate over the lines of r.stats'  
>>> output
>>> rather than read it all into a list then iterate over the list.
>>
>> To do a histogram, I need to send ax.hist a list of values. So I  
>> don't
>> know how I can get away without creating that list unless I use a
>> completely different algorithm (something from numpy?).
>
> Don't use axes.hist(), use axes.bar(). I.e. leave the calculations to
> GRASS, and only use matplotlib for plotting.

Fine and probably faster. But currently r.stats will only bin fp maps.

>
>
> ax.hist() and numpy.histogram() are broken by design. They should
> accept an iterator as an argument. Requiring the entire data to be
> passed as a list makes them useless for large amounts of data.

Well. For *really* large amounts of data I suppose. And indeed  
sometimes people have Gb maps to work with. However, 15 sec. for  
histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you  
say, it would faster to use C modules in GRASS for binning if we had  
them.

>
>
> If we were to impose a requirement that maps must fit into memory,
> writing GRASS modules would be significantly simpler. GRASS would also
> be significantly less useful.
>
>> On the other hand, in spite of recent improvements, d.hist is still
>> pretty ugly, with formatting issues like varying font sizes on a
>> single axis. And it is not very flexible. It would be nice to see
>> where standard deviations lie, customize axis formatting, etc. For
>> this module in particular, it is probably better to bin the data in
>> another way than I have, and input it into one of the matplotlib plot
>> methods.
>
> Actually, for statistical information, the most important feature is
> the ability output data in formats that are useful to *real*
> statistical software. There must be packages out there (even free
> ones) which will do a far better job than anything we are going to
> provide.

Certainly. But there is a convenience factor too. r.univar is an  
example. If I want to do general purpose stats, I do dump to a stat  
program. But some stats may be used regularly enough that it makes  
sense to do them internally. And it is useful to be able to get and  
obtain some stats that can be piped into other scripts or modules.

>
>
> Sure, we can provide "basic" functionality, but where do you draw the
> line? Which features of a "real" statistics package *wouldn't* be
> useful in GRASS? I'm really quite worried about the potential for
> feature creep in this area.

I guess I'd leave it up to the user/developer base to decide what  
kinds of functionality we need. If it is something regularly  
necessary, someone will probably craft a script for it. If this is  
really useful, it might get translated into  C-code. Visualization and  
spatial analysis is an important part of GIS functionality to me. In  
fact, a lot of graphing programs and stat packages would have  
difficulty in dealing with 20, 50, or 100 million points. MatPlotLib  
(or something like it) seems like a good tool to have available for  
development. It's also an encouragement for people to begin to develop  
scripts in Python.

>
>
> And library of plotting functions isn't a statistics package. You want
> something where GRASS hands over the data and forgets about it. We
> shouldn't be responsible for communicating from the user to the
> software details such as what type of graph to draw, or the colours or
> symbols or whatever.

For some things like histogramming, I think you're right. there really  
isn't much point in even being able to change the color. But it should  
look nice enough to publish. For other graphing, there may be use for  
more user control. The profile graphing module in the wxGUI probably  
has more  user options than needed. I admit to using it to see what  
was possible.

However, it's nice if there are easy programming options for creating  
nice graphs where these are useful. And GRASS currently lacks such  
tools AFAICT. d.graph is really primitive. It seems better to simply  
use a nice, pre-existing graphing library (like we do with gdal and  
proj), rather than continue to try to create our own. Maybe there are  
better graphing libraries for GRASS than matplotlib. It doesn't do  
everything, but does contain sufficient functionality to cover most of  
the graphing needs for a high-end GIS. And it's pretty easy to work  
with.

Michael








More information about the grass-dev mailing list