[GRASS5] Re: [Fwd: whinging about GRASS again]

Tue Feb 1 10:27:23 EST 2005

On Tue, Feb 01, 2005 at 09:23:06PM +1300, Hamish wrote:
> Russell:
...

> > Second, when they
> > import something into $HOME/.grass, look at its bounding box.  If it's
> > 0,0 through (say) 2048,2048, then it's an xy projection.  If it's
> > 45,-75 through 44,-74, it's lat/lon.  If it's howevermany hundred
> > thousand by howevermany million, it's UTM.  Prompt them for the
> > projection, and use the inferred value as the default.  That's now the
> > projection for everything in $HOME/.grass.
> 
> So what if the user is in Europe or Asia? XY or Lat-lon?
> I'm in New Zealand, we use a couple different howevermany million
> projections here and a couple of different map datums; UTM isn't used
> much if at all.
> 
> The very important point is this: it is much better to make no choice
> at all rather than to start making incorrect assumptions. This way the
> user knows where the error is and what question has to be answered.
> It's a very important and well demonstrated point. Many disasters.
> 
> with respect to setting locations automatically from GeoTIFFs by
> default: I've got a CD here with about 50 important maps, all with
> bogus/incorrect metadata. I don't think this is so unusual, upstream
> data sources of specialist items often have less than perfect quality
> control. Just in my one case yes, but the problem exists, and a new user
> is never going to be able to know what to trust..
> I am reminded of Excel vs. Matlab in taking an average of a series of
> data points. Excel will take the average irregardless of the number of
> NaN cells; Matlab will cough blood and make you explicitly tell it
> that's what you really really want to do. Ease of use vs. imposed
> correctness isn't always a bad thing.

Just an off-topic addition from bioinformatics about what happens 
if programs decide the data structure/type/format. Have a look at
this article (full text online):

"Mistaken Identifiers: Gene name errors can be introduced
 inadvertently when using Excel in bioinformatics"

 Zeeberg et. al, BMC Bioinformatics 2004, 5:80
 doi:10.1186/1471-2105-5-80
 http://www.biomedcentral.com/1471-2105/5/80

"Abstract

 Background
 When processing microarray data sets, we recently noticed that some
 gene names were being changed inadvertently to non-gene names.

 Results
 A little detective work traced the problem to default date format conversions
 and floating-point format conversions in the very useful Excel program package.
 The date conversions affect at least 30 gene names; the floating-point
 conversions affect at least 2,000 if Riken identifiers are included. These
 conversions are irreversible; the original gene names cannot be recovered.
 ...
 For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] [3]
 was being converted to '1-DEC.'
 ...
 For example, the RIKEN identifier "2310009E13" was converted irreversibly
 to the floating-point number "2.31E+13."
"
 [3] http://www.biomedcentral.com/1471-2105/5/80/figure/F1
 (more screenshots at the left of the main article page)

To me this sounds like a desaster.

So, please, let's about such automated rubbish in GRASS.

Markus