[GRASS-dev] matplotlib example script
Michael Barton
michael.barton at asu.edu
Fri Jul 25 18:01:34 EDT 2008
On Jul 25, 2008, at 11:59 AM, Glynn Clements wrote:
>
> Michael Barton wrote:
>
>>> Even if it takes just as long, you're less likely to have it fail
>>> because "plotlist" consumed all available RAM. As it stands,
>>> plotlist
>>> will have one entry for every non-null cell in the raster.
>>>
>>> When processing "bulk" data, anything that uses a fixed amount of
>>> memory (e.g. one integer per bin) is preferable to using memory
>>> proportional to the size of the input.
>>>
>>> Hence the recommendation to iterate over the lines of r.stats'
>>> output
>>> rather than read it all into a list then iterate over the list.
>>
>> To do a histogram, I need to send ax.hist a list of values. So I
>> don't
>> know how I can get away without creating that list unless I use a
>> completely different algorithm (something from numpy?).
>
> Don't use axes.hist(), use axes.bar(). I.e. leave the calculations to
> GRASS, and only use matplotlib for plotting.
Fine and probably faster. But currently r.stats will only bin fp maps.
>
>
> ax.hist() and numpy.histogram() are broken by design. They should
> accept an iterator as an argument. Requiring the entire data to be
> passed as a list makes them useless for large amounts of data.
Well. For *really* large amounts of data I suppose. And indeed
sometimes people have Gb maps to work with. However, 15 sec. for
histogramming a 30 million cell, 23Mb ASTER file isn't too bad. As you
say, it would faster to use C modules in GRASS for binning if we had
them.
>
>
> If we were to impose a requirement that maps must fit into memory,
> writing GRASS modules would be significantly simpler. GRASS would also
> be significantly less useful.
>
>> On the other hand, in spite of recent improvements, d.hist is still
>> pretty ugly, with formatting issues like varying font sizes on a
>> single axis. And it is not very flexible. It would be nice to see
>> where standard deviations lie, customize axis formatting, etc. For
>> this module in particular, it is probably better to bin the data in
>> another way than I have, and input it into one of the matplotlib plot
>> methods.
>
> Actually, for statistical information, the most important feature is
> the ability output data in formats that are useful to *real*
> statistical software. There must be packages out there (even free
> ones) which will do a far better job than anything we are going to
> provide.
Certainly. But there is a convenience factor too. r.univar is an
example. If I want to do general purpose stats, I do dump to a stat
program. But some stats may be used regularly enough that it makes
sense to do them internally. And it is useful to be able to get and
obtain some stats that can be piped into other scripts or modules.
>
>
> Sure, we can provide "basic" functionality, but where do you draw the
> line? Which features of a "real" statistics package *wouldn't* be
> useful in GRASS? I'm really quite worried about the potential for
> feature creep in this area.
I guess I'd leave it up to the user/developer base to decide what
kinds of functionality we need. If it is something regularly
necessary, someone will probably craft a script for it. If this is
really useful, it might get translated into C-code. Visualization and
spatial analysis is an important part of GIS functionality to me. In
fact, a lot of graphing programs and stat packages would have
difficulty in dealing with 20, 50, or 100 million points. MatPlotLib
(or something like it) seems like a good tool to have available for
development. It's also an encouragement for people to begin to develop
scripts in Python.
>
>
> And library of plotting functions isn't a statistics package. You want
> something where GRASS hands over the data and forgets about it. We
> shouldn't be responsible for communicating from the user to the
> software details such as what type of graph to draw, or the colours or
> symbols or whatever.
For some things like histogramming, I think you're right. there really
isn't much point in even being able to change the color. But it should
look nice enough to publish. For other graphing, there may be use for
more user control. The profile graphing module in the wxGUI probably
has more user options than needed. I admit to using it to see what
was possible.
However, it's nice if there are easy programming options for creating
nice graphs where these are useful. And GRASS currently lacks such
tools AFAICT. d.graph is really primitive. It seems better to simply
use a nice, pre-existing graphing library (like we do with gdal and
proj), rather than continue to try to create our own. Maybe there are
better graphing libraries for GRASS than matplotlib. It doesn't do
everything, but does contain sufficient functionality to cover most of
the graphing needs for a high-end GIS. And it's pretty easy to work
with.
Michael
More information about the grass-dev
mailing list