[GRASS5] Multiple attribute support in GRASS 5.1: some considerations (long)

aaime aaime at libero.it
Sun May 13 10:58:15 EDT 2001


During the last days I've spent some time thinking about new vector 
capabilities in GRASS 5.1, and in particular for what concerns multiple 
attribute support and DBMS integration. I would like to share my thougths
with you. Please forgive my english, I'm not used to write such a long text
and I don't have enogh time to consult a dictionary and a grammar :-)

In my opinion we should thinks first about what kind of functionality we want 
to include in GRASS 5.1 before thinking what kind of data structure to adopt.
Here is a possible list of interesting features that one can hope to find 
into a vector GIS:
A) ability to store multiple attributes and to have them showed by clicking 
on a map, ability to choose which attribute to use when performing 
computations on the map;
B) ability to support overlay operations on vector data (which means also to
join attribute tables) (overlay: intersection, erase, identity and so on, if 
you're familiar with ARC/INFO);
C) ability to query maps both by spatial criteria, both on the attribute 
values just like in a SQL query;
D) ability to relate attribute tables with some other non spatial information 
(catastral map with id referring to a table describing the owners of each 
parcel, and so on);
E) ability to make concurrent users to make modifications on the same data.
There may be other requests, of course, and I haven't considered 3D vector 
data such as TIN, but I think they are enogh to explain my proposals. 

I think that A) and B) are essential requirements if we want to claim that 
GRASS is a vector GIS. 
C) is so common among vector GIS that I would look with suspect to a GIS that 
don't perform such an operation. D) and E) are usually offered by high end 
systems in conjunction with a DBMS that sports spatial data extensions, and 
may be offered by GRASS if OSVectDB would turn into a real system (I think 
that now they are at a specification level).
Now, let's see what kind of data structure we can use in order to support A), 
B), and C) functions. 
To support only A) and B) plain files are a good solution until the number of 
data involved is not high. There are many possibilities, but I think that DBF 
files are a good solution. Why? 
* they are binary files, so access is faster that ASCII files;
* they are quite standard, almost any spreadsheet can read them and most DBMS 
  have some way to import them (well, at least for what concerns commercial   
  DBMS);
* because they don't require any growth in our software base, we already have
  a library to access them: shapelib.
Althought shapelib has limited capabilities when it comes to manage dbf files 
I think that it does what is needed. So we could store only one index 
toghether with geometric data and have all attributes stored in the DBF file. 
That'a a simple solutions, but it seems also effective when only A) and B) 
requirement are considered.
If you consider also C) requirements DBF are not the best choice, since they
don't support access thru SQL language. I think that here a DBMS is 
necessary, since we get the power of SQL queries for free. Berkeley DB is not
a solution because it doesn't support SQL. PostgreSLQ is, and thru 
referential integrity capabilities it would allow us to support also E) 
requirement. If we want to stick on DBF files we have to choose wheter to 
build into GRASS a minimal SQL support by hand or not let the user perform 
queries unless a real DBMS is used. A SQL support based on DBF files would
be anyway slow because one have to do a sequential scan on attribute files 
whereas a DBMS can use indexes and a built in query optimizer.

A solution that is based on storing topologic information in our classic 
files and attributes into a database (DBF or Postgres) seems to me a good 
choice. But it's not enogh.
When it comes to give good support to overlay and spatial queries you also 
have to think at a fast way perform them: spatial indexes are
the solutions, and there are some already made libraries that can build 
R-trees... the spatial index would be stored in a sepate file. So, one file 
for the topology, one file for attributes and one (optional) file for the 
spatial index. Since performance is an optional, we could add spatial index 
support later (say in GRASS 6) and do sequential scans in the meantime.

Now, I also would like to perform some criticts on site data:
* access is slow, mainly because they are kept in ascii format and because
  the data structure can vary from record to record (-> site format is now    
  too flexible);
* site API is not the best part of the GIS library, in my humble opinion, but
  that is mainly due to the poor file structure.
Why treat line, polygon and point data in a different way? Wouldn't it be 
possible, and more efficient, to store coordinates and an index into a binary 
file and put all the attributes into a DBF file? Or in a table inside a DBMS?

Using binary files would give us a huge performance improvement, and to
smaller files. I've seen it a the GRASS Day 2001 in Trento, Italy, somone had 
an implementation of a site API and format that stores all data in a binary 
file that also happens to be a quadtree (a fast way to store and index point 
data -> they performed spatial queries in a really fast way, it was 
impressive). I think that he's willing to donate that API to GRASS, he
seemed only concerned about stability and code quality.

Using DBMS tables or DBF files every record would get the same attributes, 
and we would have attributes names too -> this would also lead to a cleaner 
site API.
You should also consider that this way line, polygon and site management 
would share some code leading to a smaller gis library (that means also 
smaller to mantain, a nice feature in the long run). This would also lead to
an easier attribute management when it comes to use polygon and site
data at the same time (I'm thinking about Voronoi diagrams, but also
to overlay between polygon and site data).

What are your opinions?
Regards
Andrea Aime

---------------------------------------- 
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo at geog.uni-hannover.de with
subject 'unsubscribe grass5'



More information about the grass-dev mailing list