[GRASS5] Multiple attribute support in GRASS 5.1: some considerations
(long)
aaime
aaime at libero.it
Sun May 13 10:58:15 EDT 2001
During the last days I've spent some time thinking about new vector
capabilities in GRASS 5.1, and in particular for what concerns multiple
attribute support and DBMS integration. I would like to share my thougths
with you. Please forgive my english, I'm not used to write such a long text
and I don't have enogh time to consult a dictionary and a grammar :-)
In my opinion we should thinks first about what kind of functionality we want
to include in GRASS 5.1 before thinking what kind of data structure to adopt.
Here is a possible list of interesting features that one can hope to find
into a vector GIS:
A) ability to store multiple attributes and to have them showed by clicking
on a map, ability to choose which attribute to use when performing
computations on the map;
B) ability to support overlay operations on vector data (which means also to
join attribute tables) (overlay: intersection, erase, identity and so on, if
you're familiar with ARC/INFO);
C) ability to query maps both by spatial criteria, both on the attribute
values just like in a SQL query;
D) ability to relate attribute tables with some other non spatial information
(catastral map with id referring to a table describing the owners of each
parcel, and so on);
E) ability to make concurrent users to make modifications on the same data.
There may be other requests, of course, and I haven't considered 3D vector
data such as TIN, but I think they are enogh to explain my proposals.
I think that A) and B) are essential requirements if we want to claim that
GRASS is a vector GIS.
C) is so common among vector GIS that I would look with suspect to a GIS that
don't perform such an operation. D) and E) are usually offered by high end
systems in conjunction with a DBMS that sports spatial data extensions, and
may be offered by GRASS if OSVectDB would turn into a real system (I think
that now they are at a specification level).
Now, let's see what kind of data structure we can use in order to support A),
B), and C) functions.
To support only A) and B) plain files are a good solution until the number of
data involved is not high. There are many possibilities, but I think that DBF
files are a good solution. Why?
* they are binary files, so access is faster that ASCII files;
* they are quite standard, almost any spreadsheet can read them and most DBMS
have some way to import them (well, at least for what concerns commercial
DBMS);
* because they don't require any growth in our software base, we already have
a library to access them: shapelib.
Althought shapelib has limited capabilities when it comes to manage dbf files
I think that it does what is needed. So we could store only one index
toghether with geometric data and have all attributes stored in the DBF file.
That'a a simple solutions, but it seems also effective when only A) and B)
requirement are considered.
If you consider also C) requirements DBF are not the best choice, since they
don't support access thru SQL language. I think that here a DBMS is
necessary, since we get the power of SQL queries for free. Berkeley DB is not
a solution because it doesn't support SQL. PostgreSLQ is, and thru
referential integrity capabilities it would allow us to support also E)
requirement. If we want to stick on DBF files we have to choose wheter to
build into GRASS a minimal SQL support by hand or not let the user perform
queries unless a real DBMS is used. A SQL support based on DBF files would
be anyway slow because one have to do a sequential scan on attribute files
whereas a DBMS can use indexes and a built in query optimizer.
A solution that is based on storing topologic information in our classic
files and attributes into a database (DBF or Postgres) seems to me a good
choice. But it's not enogh.
When it comes to give good support to overlay and spatial queries you also
have to think at a fast way perform them: spatial indexes are
the solutions, and there are some already made libraries that can build
R-trees... the spatial index would be stored in a sepate file. So, one file
for the topology, one file for attributes and one (optional) file for the
spatial index. Since performance is an optional, we could add spatial index
support later (say in GRASS 6) and do sequential scans in the meantime.
Now, I also would like to perform some criticts on site data:
* access is slow, mainly because they are kept in ascii format and because
the data structure can vary from record to record (-> site format is now
too flexible);
* site API is not the best part of the GIS library, in my humble opinion, but
that is mainly due to the poor file structure.
Why treat line, polygon and point data in a different way? Wouldn't it be
possible, and more efficient, to store coordinates and an index into a binary
file and put all the attributes into a DBF file? Or in a table inside a DBMS?
Using binary files would give us a huge performance improvement, and to
smaller files. I've seen it a the GRASS Day 2001 in Trento, Italy, somone had
an implementation of a site API and format that stores all data in a binary
file that also happens to be a quadtree (a fast way to store and index point
data -> they performed spatial queries in a really fast way, it was
impressive). I think that he's willing to donate that API to GRASS, he
seemed only concerned about stability and code quality.
Using DBMS tables or DBF files every record would get the same attributes,
and we would have attributes names too -> this would also lead to a cleaner
site API.
You should also consider that this way line, polygon and site management
would share some code leading to a smaller gis library (that means also
smaller to mantain, a nice feature in the long run). This would also lead to
an easier attribute management when it comes to use polygon and site
data at the same time (I'm thinking about Voronoi diagrams, but also
to overlay between polygon and site data).
What are your opinions?
Regards
Andrea Aime
----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo at geog.uni-hannover.de with
subject 'unsubscribe grass5'
More information about the grass-dev
mailing list