[GRASS5] Proposed process for GRASS reorganization (please read)

Thu Mar 22 10:04:38 EST 2001

As we all know GRASS components work independently of each other now knowing what the other is doing (or knowing very little). I have been proposing we reorganize the GRASS library into a model which will make all our lives much easier in the long run. It is also the first step in creating a more general GIS solution where one hand knows what the other is doing. For example after executing a command which alters the database one will be able to undo that operation. Also after a map is edited the results can be seen immediately.

There are many advantages to the reorganization:

* Localization for various languages/OS encodings using unicode and standard date formats. Also saving a user's preferences in the standard (platform-specific) location.

* High-level scripting. Not macros, actual scripting (AppleScript, VBScript, perl, etc.)

* Multiple undo/redo

* Plug-in architecture

* process-independent notification system and high-level IPC.

* decentralized persistence

* Hierarchical persistence (uses XML)

* Arbitrary file organization (thinking outside the database structure)

* A more flexible, abstract manner to access and edit data.

* Ability for independent 3rd parties to extend the framework without being compiled in. Meaning plug-ins are compiled but usable by everybody and are dynamically loaded. Thus plug-ins are not only used to add commands and other interface elements, but can extend the framework. This is a major reason why releasing the GRASS framework under a license like LGPL would be a very good idea.

This also makes the framework highly modular without having to build it in separate modules.

I also want to point out our current GRASS library any changes must be strictly controlled, especially the API. This restriction can be dropped with a GRASS framework utilizing plug-ins. The 3rd party additions can always be merged later if desired.

* organized memory management.

* ownership (two commands can't retain the same object)

* re-entrant objects, thread safety

* Most importantly: straightforward method for extending, reusing, and adding features.

I'm going to establish a few key concepts then propose a process to get from our current situation to the nirvana of MVC.

MVC stands for model-view-controller. Its a programming methodology (many call it a paradigm) whereby your model objects manage data, view objects display data, and controller objects link the two. Model and view classes are highly reusable while controller classes control the behavior of the whole application.

(a class is like a classification like dog. An object is a class implementation like a dog called spot)

For example a vector map could be a model object. The map class could contain a mutable array of vector series and a mutable array of closed loop vectors. The map class could have methods to:

* edit contents. For example adding a stream to a stream map (adding a vector series)

* return data. Like returning the number of closed-loop vector sets which could mean number of lakes.

* return a persistent representation of itself. Either a binary or property list.

* return data if touched method.

* draw itself. This method can be added to the class by another binary which is included with a graphics package.

Three example view objects could be a log view, a table view, and an image view. The view class is platform specific, it would be different depending if you were using the command line, X11, Win32, OS X etc. A log view can be as simple as an object which calls printf(), although another platform may make a graphical log view. You may also usually print views.

An example controller would be a command. The command "v.report" for example could gather information from the specified vector map model object, create a mutable array of statistics, create a window with a table view, pass the mutable array to the view to display, and pass the undo manager an action to delete the window. The command will also register different methods to be called like a command macro for the shell, a menu item, a toolbar item, etc.

Other than the command line controller and view, views and controllers are obviously platform (and user preference) dependent. They obviously depend on the host platform's API. For example Win32 may use ACTIVE-X while OS X may use AppKit. However the API for using basic gestures like drawing a map will be universal. Furthermore we have many basic data types and sets already defined along with our own models. Thus a new command may have a name and category and returns a mutable array. Using this information I can inset the command in the appropriate graphical menu, scripting dictionary, and toolbar of my host controllers. I can also create a macro for use in the command line. My controller can also setup the appropriate undo/redo actions if the command specifies them. For example a command adding a closed loop vector set to a vector map will either have an undo action or request the undo stack be cleared. In this example the undo action will merely remove the last closed loop vector set from the vector map.

The specific implementation of universal controllers and localized controllers goes way beyond this email and is totally unrelated to the implementation of the model, which is an entirely independent development. I'm only establishing the reason for writing the model in the first place. Now that I've made my case for the MVC programming methodology I'll propose several ways of getting there.

The implementation:

Currently the GRASS library is a simple abstraction between the programs (the commands) and the data. It provides inline functions for performing various calculations and creating sites in the database. One option is to create a model which uses the current GRASS library API. However this model would break every time the GRASS library was used directly. It also needs to be run in a particular environment so it knows the current site and database. The environment is defined by environment variables and by a file in the database's root directory. I see lots of problems with this which we may as well fix.

I think we should remove the .grass5rc file dependency. Its pretty much useless since it contains stuff which will be stores in the user preferences file which is application-specific. 

Persistence:

First if commands are controllers run by a host application, the host app will have its own current programmatic database which for all intensive purposes can be saved in an arbitrary manner. There is no reason to have one database which everybody uses if the user in question is only interested in a local 'location'. This local version ought to be encapsulated so it may be opened like a project.

This doesn't mean that we have to change the format or abandon the collaborative database. In fact the same 'location' format can be used, encapsulated as a bundle. A bundle is a directory with an XML property list. The bundle typically has a name extension like spearfish.grass and the property list includes information like the bundle type, creator, icon, comment, as well as any properties used by the host application specific to that project. A bundle is typically seen as one file and implies the contents should not be changed manually, only applications meant to deal with that type of bundle should.

Thus we have two methods of persistence. A common database and a local bundle. In fact the *only* difference is the name of the directory and the XML property list file in the root directory. 

Once the model is available it's trivial to support multiple databases or multiple sites in different databases. Just create an abstract class and have a mutable array of databases.

The organization and content of files in the bundle other than the property list is arbitrary, however it will likely be changed because the programmatic organization is not arbitrary. Instead each model must implement a method to represent itself as data suitable for persistence. There will be a set of base types which are stored according to type and hierarchy, but that's handed by the database class (or the site class).

Types other than data could be stored in XML property lists while data types would be stored as binary files with their names (and byte order) recorded in property lists. Thus a vector set could return its persistent self as an array, but it would be better to return a data since a property list would likely be huge.

How one splits up the data into files really depends on the size of such files. Once a model object is 'touched' it can be saved or reverted. If a small piece of data in a huge data file is being touched often it would probably be better to use a separate file. The organization can either be done at the head or decentralized, I don't really care either way. One advantage of a decentralized method is each object can choose separation based on runtime conditions. 

Data hierarchy:

A GRASS location contains sites and other info. A site contains maps, a region, and other info. A map can be either raster or vector and may contain other data. Thus we have a hierarchy.

The names I use here are far too simple to use as the actual class names. For example 'map' would probably be 'GRASSMap'. I'll only list a few until you get the idea.

A 'database' is a highly abstract object which is no more than a working directory path or URL.

A 'location' is an abstract object which contains sites. The sites may have different regions thus the category is largely arbitrary. GRASS defines one of these sites as immutable. It doesn't serve as a backup, it's more like a master copy. It may be useful for calculating sites with different projections since projections always lowers the coordinate accuracy so It's preferable to have one if you want to do such conversions. Thus a location requires at least one immutable site. A location object can create a location bundle (spearfish.grass) as an arbitrary URL.

A 'site' is a collection of maps with the same region. A site can store itself as a bundle at an arbitrary URL, but is only told to store itself by its host (location) object. If its immutable its probably appropriate to set the directory's immutable bit.

A 'region' is an area bounded by a quadrangle in a specific projection which is used for all maps in the site. Personally I don't think this is a good category. I would rather have every map have its own bounds so I can composite various maps of the same projection on a single canvas. Why must various maps of one site share one bounds? Also why have the same cell resolution for all raster maps in one site? I would remove such restrictions, but I can still implement what I want even if these curious restrictions are left in place.

To remove them I would have one projection per site, but the other region information would be stored in each map. If you want an arbitrary abstraction of maps sharing the same region and cell resolution you can do so, but I don't know why.

A 'map' is a class which defines common methods and variables for both map types. Personally I would put the non-projection region info here.

A 'raster map' is a subclass of map which is an array of cells. Personally I would put the cell resolution info here.

A 'vector map' is a subclass of map which has a mutable array of open-ended vector sets and a mutable array of closed-loop vector sets. For example steams and lakes.

A 'vector set" is a mutable array of vectors and a 'is open' flag. Closed (non-open) vector sets probably have to be clockwise.

A 'vector' is a simple struct of two coordinates and a coordinate is a struct of two floats.

Anyway you get the idea.

The Process:

Which API to use:

Currently the GRASS library is a bunch of inline C functions which don't keep track of one another (the function hierarchy is a facade). MVC programming methodology implies some form of OO structure. This means a new programmatic structure has to be adopted. There are several to choose from. I am certain the best choice is either CoreFoundation or OpenStep Foundation (which I will just call Foundation). I'll list the advantages and disadvantages of both libraries:

Foundation is the OpenStep Foundation API. This API has two active implementations, Apple's Cocoa and FSF's GNUStep. You can also use OpenStep for Solaris and OpenStep for HPUX (if you can find it, OpenStep competed with Java so Sun killed it).

The advantage of Foundation is it's a complete solution. It defines standard types and categories of types (arrays, dictionaries, sets, etc.). It supports:

* XML property lists
* bundles
* plug-ins
* notifications (like system-wide events)
* organized memeory management
* undo manager
* automatic localization via loading property lists and strings in standard format.
* threading
* sockets (for publishing objects over a network or locally)

and lots of other goodies. Foundation is also written in an excellent language called Objective-C which is very easy to learn if you know C. Objective-C is so flexible you can add  methods to existing classes. Foundation includes everything needed to complete the model.

The major advantage in Foundation is its relatively easy to write very sophisticated user interfaces and display graphics using Application Kit (which is also part of Apple Cocoa, FSF GNUStep, OpenStep Solaris, and OPENSTEP). AppKit uses Foundation classes as the basis for its standard views and controllers like NSTableView. Another major feature of AppKit is resolution independent graphics. When you draw a vector map you do so in floating corrdinates, NOT pixel coordinates. You can take this view and display it (which will implicitly rasterize it to screen) or you can print it (which creates a small postscript or PDF file) or you can export it as a resolution independent PDF or a rasterized TIFF or whatever. You can also take that view and scroll, zoom, etc. with ease. Finally any changes to any model would be seen instantly.

The disadvantage is it's written in a language which isn't straight C, which everybody here uses. I don't consider this a major problem because it took me ONE DAY to learn Objective-C, but I can't speak for the rest of you.

CoreFoundation is basically a subset of Foundation with C/C++ interfaces. Its Apple open source and part of Darwin but has been ported to several platforms. Its advantage is it's written in straight C. Its disadvantage is, in my opinion, more difficult than simply learning Objective-C. One advantage is it's plug-in facility is modeled after Microsoft COM if any of you are familiar with COM, but then again I consider that a disadvantage for those of us who don't give a crap about COM.

It also has the disadvantage of not currently being usable by GNUStep's AppKit, although it's usable by OS X's Cocoa. I don't know what a $2000 investment in hardware is like for you folks but I'm probably the only one using OS X.

What this means is, for now, if I make a really cool application using AppKit and the model is in CoreFoundation, it won't work in GNUStep. However adding this ability to GNUStep shouldn't be that hard.

You can compare CoreFoundation to Foundation for yourself. I'm largely agnostic which more GRASStep developers prefer, I'll follow the crowd (if one ever forms)

CoreFoundation (read Overview once through):
http://developer.apple.com/techpubs/macosx/CoreFoundation/

Foundation (Intro is OK)
http://developer.apple.com/techpubs/macosx/Cocoa/Reference/Foundation/ObjC_classic/FoundationTOC.html

Objective-C tutorial I used to learn it in ONE DAY:
http://developer.apple.com/techpubs/macosx/Cocoa/ObjectiveC/index.html

Coding:
Once the model hierarchy is written (using CoreFoundation or Foundation) the rest is filling in the blanks. A class is just a bunch of methods (functions) which use private variables. This means taking code from src/libes directory and filling out to our framework. Most of the changes are going to be using Foundation types and using object pointers instead of array pointers and struct pointers. I'm simplifying but the main thing to consider is the process will be methodical and straightforward. There will be a right way to do everything.

What I suggest we do is before adding code to the framework, ask the original authors of the code to release their code again under the LGPL (or other license). The authors of the code have the absolute right to do this and it's very important to do. I can't imagine the authors would be against this, their code is just as protected, but the LGPL makes it possible for sophisticated plug-ins for GRASS to be written by people who want to release it without a GPL license. Nobody is forced to use such software, and nobody can steal our work.I can't emphasize enough how important this step is.

I hope everybody here is interested in improving the GRASS API. With this reorganization I can make a GRASS application which resembles an expensive illustration application. We can also provide interfaces for 3rd parties to extend GRASS to make it a sophisticated statistics system, or whatever. Once this is done GRASS would be set for the next decade and beyond. I hope you all give it due consideration because if GRASS doesn't change it'll never get any better.

Thank you for your consideration.

---------------------------------------- 
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo at geog.uni-hannover.de with
subject 'unsubscribe grass5'