[GRASS5] Multiple attribute support in GRASS 5.1: some considerations (long)

Radim Blazek Radim.Blazek at dhv.cz
Sun May 13 11:46:27 EDT 2001


aaime wrote:
> During the last days I've spent some time thinking about new vector 
> capabilities in GRASS 5.1, and in particular for what concerns multiple 
> attribute support and DBMS integration. I would like to share my thougths
> with you. Please forgive my english, I'm not used to write such a long text
> and I don't have enogh time to consult a dictionary and a grammar :-)

Have you read grass51/doc/vector/vector.html? Probably not, so I would
recomend you to look at it because some related proposals are there
and some experimental code is written in grass51 (g51). For example d.vect
with where= (SQL where) is already working.

In general I agree with your ideas, only few comments.
 
> In my opinion we should thinks first about what kind of functionality we want 
> to include in GRASS 5.1 before thinking what kind of data structure to adopt.
> Here is a possible list of interesting features that one can hope to find 
> into a vector GIS:
> A) ability to store multiple attributes and to have them showed by clicking 
> on a map, ability to choose which attribute to use when performing 
> computations on the map;
> B) ability to support overlay operations on vector data (which means also to
> join attribute tables) (overlay: intersection, erase, identity and so on, if 
> you're familiar with ARC/INFO);
> C) ability to query maps both by spatial criteria, both on the attribute 
> values just like in a SQL query;
> D) ability to relate attribute tables with some other non spatial information 
> (catastral map with id referring to a table describing the owners of each 
> parcel, and so on);
> E) ability to make concurrent users to make modifications on the same data.
> There may be other requests, of course, and I haven't considered 3D vector 
> data such as TIN, but I think they are enogh to explain my proposals. 
> 
> I think that A) and B) are essential requirements if we want to claim that 
> GRASS is a vector GIS. 
> C) is so common among vector GIS that I would look with suspect to a GIS that 
> don't perform such an operation. D) and E) are usually offered by high end 
> systems in conjunction with a DBMS that sports spatial data extensions, and 
> may be offered by GRASS if OSVectDB would turn into a real system (I think 
> that now they are at a specification level).
> Now, let's see what kind of data structure we can use in order to support A), 
> B), and C) functions. 
> To support only A) and B) plain files are a good solution until the number of 
> data involved is not high. There are many possibilities, but I think that DBF 
> files are a good solution. Why? 
> * they are binary files, so access is faster that ASCII files;

No, dbf files are ascii !!!

> * they are quite standard, almost any spreadsheet can read them and most
DBMS  >   have some way to import them (well, at least for what concerns
commercial    >   DBMS);

That was one of reasons why I wrote simple dbf driver.

> * because they don't require any growth in our software base, we already
have >   a library to access them: shapelib.
> Althought shapelib has limited capabilities when it comes to manage dbf files 
> I think that it does what is needed. 

Yes, my driver is based on shapelib. For future we must either extend shapelib
bit or use some other external library.

> So we could store only one index 
> toghether with geometric data and have all attributes stored in the DBF file. 
> That'a a simple solutions, but it seems also effective when only A) and B) 
> requirement are considered.
> If you consider also C) requirements DBF are not the best choice, since they
> don't support access thru SQL language. I think that here a DBMS is 
> necessary, since we get the power of SQL queries for free. Berkeley DB is not
> a solution because it doesn't support SQL. 

dbf driver (g51) is based on simple SQL parser so that it works like
SQL database for limited subset of SQL statements.
For simple and small projects dbf driver should be enough and for larger
projects some other driver may be used (at this time only odbc available
but postgres driver should not be problem)

> PostgreSLQ is, and thru 
> referential integrity capabilities it would allow us to support also E) 
> requirement. If we want to stick on DBF files we have to choose wheter to 
> build into GRASS a minimal SQL support by hand or not let the user perform 
> queries unless a real DBMS is used. A SQL support based on DBF files would
> be anyway slow because one have to do a sequential scan on attribute files 
> whereas a DBMS can use indexes and a built in query optimizer.
> 
> A solution that is based on storing topologic information in our classic 
> files and attributes into a database (DBF or Postgres) seems to me a good
> choice. But it's not enogh.
> When it comes to give good support to overlay and spatial queries you also 
> have to think at a fast way perform them: spatial indexes are
> the solutions, and there are some already made libraries that can build 
> R-trees... the spatial index would be stored in a sepate file. So, one file 
> for the topology, one file for attributes and one (optional) file for the 
> spatial index. Since performance is an optional, we could add spatial index 
> support later (say in GRASS 6) and do sequential scans in the meantime.

David is working on spatial indexes, which should be on level 3 access.

> Now, I also would like to perform some criticts on site data:
> * access is slow, mainly because they are kept in ascii format and because
>   the data structure can vary from record to record (-> site format is now    
>   too flexible);
> * site API is not the best part of the GIS library, in my humble opinion, but
>   that is mainly due to the poor file structure.
> Why treat line, polygon and point data in a different way? Wouldn't it be 
> possible, and more efficient, to store coordinates and an index into a binary 
> file and put all the attributes into a DBF file? Or in a table inside a DBMS?
> 
> Using binary files would give us a huge performance improvement, and to
> smaller files. I've seen it a the GRASS Day 2001 in Trento, Italy, somone had 
> an implementation of a site API and format that stores all data in a binary 
> file that also happens to be a quadtree (a fast way to store and index point 
> data -> they performed spatial queries in a really fast way, it was 
> impressive). I think that he's willing to donate that API to GRASS, he
> seemed only concerned about stability and code quality.
> 
> Using DBMS tables or DBF files every record would get the same attributes, 
> and we would have attributes names too -> this would also lead to a cleaner 
> site API.
> You should also consider that this way line, polygon and site management 
> would share some code leading to a smaller gis library (that means also 
> smaller to mantain, a nice feature in the long run). This would also lead to
> an easier attribute management when it comes to use polygon and site
> data at the same time (I'm thinking about Voronoi diagrams, but also
> to overlay between polygon and site data).
> 

I agree that sites data should be stored in vector files which is possible
even in grass5.0. Otherwise we will maintain two similar libraries for vector
and sites. New vector library and modules will support points. I was not
courageous enough to suggest such thing like replace site_lists by vector
files. I remember some mails here that this question was deeply discussed
and ascii format was found as good solution but I 
think that for g51 we could consider site_lists format once more. What are
the argumets for points separated from lines and areas if any?


Radim


> What are your opinions?
> Regards
> Andrea Aime
> 

---------------------------------------- 
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo at geog.uni-hannover.de with
subject 'unsubscribe grass5'



More information about the grass-dev mailing list