let's standardize sites

David Gerdes dpgerdes at zorro.cecer.army.mil
Wed Jan 26 10:56:16 EST 1994


I wrote the following yesterday morning but didnt get around to 
sending it. The conversation has moved on, but there is some 
useful stuff in it, (i.e. posible function prototypes)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

mccauley at ecn.purdue.edu wrote:
>
>My purpose for calling for standardization was largely for 
>simplicity and consistency for the users, which would lead to 
>an easier job for programmers. Implied was a standard format
>for data, not necessarily a standard set of parsing rules.

I don't see a lot of difference here.  For all other GRASS data
types, the file format is not even documented.  Instead, libraries
(and programs) are provided to allow programmers (and users) to 
read and write them.  The same should probably happen if we are
to continue to support sites.  Otherwise you have the potential for
what has already happened, different programs act differently on 
the data and we end up with multiple interpretations of the data format.

I would guess that you want it simple to use things like awk on the
files?  From a C programming perspective, a set of subroutines 
that manages all the I/O is sufficient for the desired simplicity.


Darrell:
>When asking for standardization, I was looking for something that 
>could possibly be implemented as early as 4.2.

not unreasonable if someone just does it. Who? is the big question.

Darrell:
>In retrospect, shouldn't we keep the dimensionality the same for 
>all data formats and leave extensions to a dbms?

We certainly should expect the dbms interface to provide this (whenever
it is available).  I could go either way on this argument.  Either
freeze sites now (or 5 years ago as it were), and wait for DBMS, or
decide that multi column sites are beneficial enough w/out a DBMS, to
warrant the extra work inside GRASS.

Darrell:
>>- I like the suggestion to add the elevation field.
>>- The #category field should remain optional!
>
>then we lose consistency for all sites lists.

It's consistant, just not fixed field.  That has always been there
and I just wouldn't suggest getting rid of it.  Otherwise, if you force
me to fill in a field that I don't need, I will just stuff it w/ zeroes
thus wasting disk space, and adding unecessary details.


Me:
>>- We should develop a set of routines to parse the fields for the programmer 
>>    so that we get a standard interface. [G_get_site ()]
Darrel:
>Amen. If this were done (in whatever manner), it would alleviate
>a lot of the concern, at least from the programming point of view.

Right.


Darrell:
>>> In terms of storage space, this only adds another character (one byte)
>>> for each site to existing site lists to make them compatible, so this
>>> is not a big negative (<=5% increase in storage requirements).

Darrell:
>I want things to be consistent and simple. If we lose this, then
>we lose one of the merits of simple ascii files.

Well, you can still CREATE simple ascii files.  How many applications
are there in AWK, that READ site files.  Typically I think, most programs
that read site files will be written in C, and thus can take advantage
of this new site library.


Me:
>>How about a format like:
>><easting>|<northing>|[z|[d4|]...][#category_int] [attr_text OR %flt[%flt]...]

Darrell:
>I could foresee confusion and increased complexity for the user
>if multidimensions were added (not to mention that all programs
>would need a flag or parameter specifying the dimension in which
>calculations would be done). Multiple attributes ("%flt[[%flt]...]"),
>IMO, would probably best be left to a database program. I didn't
>anticipate creating one for this effort.

Most programs will be written for a specific purpose and a specific
dimensionality.  They will simply use the dim. fields they need.

(note that as we look at Ndim GRASS, we are considering front ends that 
allow users to select which subspace to work with, say have r.mapcalc.2d
work on the YZ plane where x=5. but this is not necessary now for a 
first pass at sites.)

How about:

easting|northing|[z|[d4|]...][#category_int] [ [@attr_text OR %flt] ... ]
to allow any number of text or numeric fields.

We could have a base set of functions for extracting any or all of
the data, AND we could have any number of specialized macro functions:

G_get_site_stats      (str, &hascat, &ndims, &nfltfields, &ntxtfields);

G_get_site_2d_flt     (str, &e, &n, &cat, &concentration)
G_get_site_2d_flt_fld (str, &e, &n, &cat, &concentration, field_num)
G_get_site_3d_flt     (str, &e, &n, &cat, &concentration)
G_get_site_3d_flt_fld (str, &e, &n, &cat, &concentration, field_num)
G_get_site_2d_text    (str, &e, &n, &cat, &text)

G_get_site_dims       (str, float dims[ndims]);
G_get_site_flt_fields (str, float fdata[nfltfields]);
G_get_site_txt_fields (str, char *tdata[ntxtfields]);


Or alternatively, M.Shapiro suggested a modification which filled
in a structure so that the string only needs to be parsed once.
Also. Note that the sites currently support a minimal header.  This 
could be extended to allow the naming of columns of data.


As for 'time',  one solution is to leave it up to the application to define
whether it should be Dim4, or one of the float fields.  Or we could
come to an agreement on adding yet another optional field to the format
for a time field.  Then we have to worry about format and units. Any
comments.



-- 
  David Gerdes
  US Army Construction Engineering Research Lab
  Spatial Analysis & Systems Team
  dpgerdes at zorro.cecer.army.mil



More information about the grass-user mailing list