[GRASS-dev] what is the ideal way to store spatial data

Fri Jan 4 07:24:30 EST 2008

Gerald Nelson wrote:

> I don't know enough to comment on the math issues specifically, but
> would like to relate a conversation I had with John MacDonald of
> MacDonald Detweiler (a big Canadian company that makes ground link
> stations etc) while serving on an advisory panel to the US national
> remotely sensed data archive (it stores much of the Landsat data). I
> was pretty naive about what actually goes on in turning raw data as
> collected by a satellite into various products that we all end up
> using. I was just interested in have a land cover/use data set and was
> arguing for the archive storing such a data set. He made two points. 
> The first was similar to the one you make, which is that any
> manipulation of raw data introduces artifacts. The second was that it
> will always be cheaper to do processing in the future than it is
> today. So the data should always be archived in raw form.
> 
> The result of his logic is that the archive does in fact store data in
> raw form, along with the operating characteristics of the satellite
> that collected it. A related recommendation he made, which has not
> been followed as far as I can tell, is that you should also archive
> the algorithms of the day (with a time stamp), so that you can
> recreate the products, which are what usually get used.

However, when to re-create and when to re-use isn't something which
the software can determine. E.g. if you are analysing trends, it is
important that all samples are processed consistently. If the older
samples were produced using inferior algorithms, you need to use the
same (inferior) algorithms for the newer samples.

If you have access to the original data, you could re-process that
using newer algorithms. But you might be producing data with the
expectation that others will be performing the analysis. In that
situation, you need to consider whether it's better to simply publish
new data consistent with older data, or to revise the older data. If
the user has already published results based upon the original data,
consistency may be more important.

> So getting back to grass, it may be too much to ask of today's (and
> tomorrow's) cpus to do processing on the fly. But I wouldn't want
> current processing constraints to be hard wired into new versions of
> grass. Or at least I would encourage the developers to consider this
> issue.

So far as CPU usage is concerned, conversion will always consume CPU
time which could have been used for something else, so you don't want
to perform unnecessary conversions. In particular, you don't want to
perform a specific conversion more than once.

Of the various processes which GRASS can perform, projection is one of
the most CPU intensive (and also memory-intensive, as it can't
generally be done row-by-row).

As CPUs get faster, the extra CPU power will often get used to perform
equivalent processing on higher-resolution data, rather than on more
complex processing. In that situation, the proportion of time spent on
projection will remain constant regardless of CPU speed.

> And I guess I would argue that the more usual user situation is
> one where the user knows less than the software, or at least the gurus
> who have written the software. I can guarantee that describes me!

I wouldn't assume that this is the usual case. GRASS isn't a
word-processor or a paint program. It's targeted at users with a
certain level of skill.

In particular, I would expect most GRASS users to know a lot more
about geography and geographical sciences than I do (I dropped
geography at school at age 14; most of my knowledge has been acquired
while working on GRASS).

-- 
Glynn Clements <glynn at gclements.plus.com>