[GRASS-dev] Re: [GRASS-user] RE: Problem querying layers other than '1' in gi s.m

Tue Sep 26 14:27:11 EDT 2006

On Tue, September 26, 2006 15:37, Trevor Wiens wrote:
> On Tue, 26 Sep 2006 11:32:38 +0200
> Moritz Lennert wrote:
>
>> Michael has already said most of what I wanted to say, but some small
>> additions.
>>
>> Michael Barton wrote:
>> >> What seems much more natural to me is
>> >> leaving the attribute management to the database where more
>> >> elegant tools exist. Thus grass modules instead having a
>> >> layer option need a input key and possibly an output key
>> >> option depending on the module. If no key field is specified
>> >> (which would be an attribute in the table linked to the
>> >> vector file), then all objects in the vector file are
>> >> processed. However if a key is used to query the vector file
>> >> for a list of objects for processing. In the case where the
>> >> same vector object has two cats, the vector attribute tables
>> >> will have to have a one to many relationship from the vector
>> >> file to the attribute table. Now modules could also allow a
>> >> query specification to allow for complex querying across
>> >> multiple keys and attributes, but output would probably have
>> >> to be limited to a series of key fields (most likely only
>> >> integers)
>> >
>> > Really, this is exactly what "layers" are now. AFAICT, the biggest
>> problem
>> > is in the terminology. Each "layer" is an integer key field in
>> database
>> > terminology. Multiple layers simply means multiple key fields, each of
>> which
>> > can be linked with an attribute table using v.db.connect.
>>
>
> No, not really. What I describe is functionality identical to current
> layers, but the critical distinction is that all the attribute
> information can be managed in the database independent of GRASS or
> without GRASS even running. Right now the cat values for a vector
> object can only be accessed through GRASS. If GRASS built a simple
> table for each vector object with a cat (or perhaps more clearly
> named an objectid) and a user defined key as well as allowing users to
> add other keys to that table, then there would be no need to run GRASS
> to update attribute information for those objects. For example, lets say
> you have a series of weather sites which would have a incremental
> objectid and a user defined key such as a stationid. If you want to
> be able to occasionally interpolate precipitation surfaces from these
> sites since all the attribute information is accessible independent
> of GRASS (I envision using PostgreSQL in this case) you can write you
> application in whatever environment you like and access the database
> outside of GRASS and add and edit new time data as needed. Then when it
> is time to create your new surface you fire up GRASS do your
> multi-table query without any call to v.db.connect because it is no
> longer needed and get the result. Done.

I don't really understand this argument. Why can't you do exactly this
with GRASS today ? First of all why do you need a separate objectid and
stationid if each station is represented by one object (let's say a point)
? You could just use the cat value of each object. You would then have a
table in PostgreSQL in which you have these cat values (possibly in a
colum you could
call stationid) and all other attributes you would like in this same
table. If you get new data for the stations you can add it to the table
without having to go through GRASS. Then when you enter grass, this new
information is available as long as your map remains linked to that table.

The only thing you cannot do currently (if I'm not mistaken) is use
aggregate queries on that table if you have more than one row for each
station. But I don't think that this is due to the general data model of
GRASS, but rather to the fact that it is not implemented.

[...]

>> > Again, except for the key fields confusingly labeled as "layers" and
>> one
>> > other legacy feature from GRASS 5, all attribute management does stay
>> in the
>> > database.
>>
>> Unless you use it in the way I suggested in an earlier mail, i.e. cat 1
>> = coniferous, cat 2 = broadleaved, cat 10 = pine, etc.
>>
>> The way it 'should' be used to stick with Trevor's suggestions is
>>
>> cat 1 = tree number 1
>> cat 2 = tree number 2
>>
>> etc.
>>
>> And then have a table with columns
>>
>> treenumber, species, etc,
>>
>> with possibly another table with
>>
>> species, type
>>
>> where type= conferous or broadleaved
>>
>> And then, if you need a map of coniferous and broadleaved trees, you
>> create a view:
>>
>> CREATE VIEW v1 AS SELECT treenumber, type FROM trees, types WHERE
>> trees.species=types.species
>>
>
> Why bother with v.db.connect at all? Just allow a query to be used to
> select the vector object keys (cats) and let the module in question
> work with that list.

I think this is potentially possible with the current model, just not
implemented. The database drivers allow any kind of query you want, and it
should not be too complicated to rewrite modules in a way to allow more
arbitrary queries then just with the current 'where' option.

Any query which returns cat values allows you to then work with these cat
values (I am currently reworking d.vect.chart to do just that) and
it should, therefore, not be too difficult to imagine modules which allow
you to define an arbitrary query and to the fulfill their task on the
basis of this query.

According to the use you make of it you obviously can have a problem if
you have more than one object with the same cat value (I have that problem
in d.vect.cat, for example), but there are ways to work around this.

> I realize that many people not familiar with SQL
> will find this difficult, but surely we could consider as part of
> upgrading the GUI front end with a simple query builder.

I should also be possible to offer both solutions.

> A view would be convenient for ongoing use, but shouldn't be necessary
> for single time uses.

I agree totally.

[...]

>
> Well changing the terminology would certainly help, but the fundamental
> problem was clearly defined by Moritz when he suggested that the
> problem is mixing of database concepts with GIS concepts. Thus my
> suggestion to keep database functions in the database.

Well GIS as such makes no sense if it is not understood as the link
between geometries and data, so you always have to mix the two in one way
or another. The question is more on how to do this in a way which is most
efficiently _and_ offers the most functionality.

>>
>> v.buffer is a very special case, and I don't know how you would solve
>> the question in your system: the attribute information is lost since
>> v.buffer fusions overlapping buffers into one single buffer. As
>> mentioned on the man page, there is no automatic way to know which cat
>> (or keyvalue) to give to this single fusioned buffer.
>
> My solution would work in the sense that new areas created would be
> given a new objectid whereas areas (technically centroids associated
> with areas) up to the point of overlap would retain their original
> objectid and thus would have direct access to any associated attribute
> information through a simple query.

Well, it should be no problem to reprogram v.buffer to do just that. Its
current implementation works with the assumption that as you could
potentially have overlaps, you treat all buffers as if they were overlaps,
but you could obviously include some test in the code which treats buffers
differentially. Again, I don't see how this is a problem of the model
rather than of the implementation of a particular module.

>
> It is important to note that my objectid terminology only makes sense
> if this value is singular and immutable.

This might actually be the fundamental point in the argument. Currently
GRASS doesn't enforce this as a rule (well actually, IIUC, each object has
a its line number as a unique identifier, this is just not visible to the
user). The question is whether it should enforce something like this, or
whether the current model doesn't allow more flexibility by allowing to
limit yourself to unique id's for each object, but also allowing the use
of non-unique id's, or multiple id's.

Moritz