[GRASS-user] Re: [Qgis-user] Basic GIS course ideas

Benjamin Ducke benjamin.ducke at ufg.uni-kiel.de
Mon Oct 9 03:34:00 EDT 2006

I agree, a dedicated site for teaching open source GIS would
be the best idea, since there are so many components involved
here: GRASS, QGIS, R, PostgreSQL, ParaView, ...

I have attached a draft text about GRASS and QGIS for archaeologists.
It also contains a general introduction to GIS concepts and file

Obviously, it needs to be generalized, so that it applies to all
GIS users equalla, and there are still a lot of blanks to fill in.

But it's considerably more fleshed out than what is on the Wiki so
far. Maybe it's a start ...


Sören Gebbert wrote:
> Hi,
> Hamish schrieb:
>> Otto Dassau wrote:
>>>>> But I do not have the time or resource to set up an appropriate
>>>>> website and administer this.
>>>> The GRASS wiki site is a perfect place for this development.
>>>>  (can it serve PDF, presentation attachments?)
>>> no, currently it doesn't support uploads, documents (pdf, images, ...)
>>> have to be stored somewhere else.
>> create a quasi-liberal access area in gforce GRASS svn? or better,
>> create a new project on gforge for "open_gis_edu" or similar.
>>   http://wald.intevation.org/projects/grass/
> Thats a great idea.
> Can we use this place to put video tutorials online?
> I just created a very simple gui feature demo with grass63
> to encourage all users and dev's to create video tutorials.
> http://www-pool.math.tu-berlin.de/~soeren/grass/modules/screenshots/grass63feature_tour.html 
> It would be great to have many thematic video tutorials
> to explain the basics and advanced features of grass.
> Best regards
> Soeren
>> Hamish
>> _______________________________________________
>> grassuser mailing list
>> grassuser at grass.itc.it
>> http://grass.itc.it/mailman/listinfo/grassuser
> _______________________________________________
> grassuser mailing list
> grassuser at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser

Benjamin Ducke, M.A.
(Archaeoinformation Science)
Institut für Ur- und Frühgeschichte
(Inst. of Prehistoric and Historic Archaeology)
Christian-Albrechts-Universität zu Kiel
Johanna-Mestorf-Straße 2-6
D 24098 Kiel

Tel.: ++49 (0)431 880-3378 / -3379
Fax : ++49 (0)431 880-7300

-------------- next part --------------


Open Source Geoinformation Technology for Archaeologists


This course is an introduction to the open source 
geoinformation systems QGIS (Quantum GIS) and GRASS 
(Geographic Resources Analysis Support System). It was 
written for archaeologists with a minimum background in 
GIS technology (and an idea what they want to use it 

Table of Contents

    1 GRASS and QGIS for archaeologists
        1.1 Installation
            1.1.1 Single user installation
            1.1.2 Multi user installation
            1.1.3 Installing tutorial data
        1.2 File system layout
            1.2.1 The GRASS database
            1.2.2 The GRASS home directory
    2 Basic GIS concepts
        2.1 Layers
        2.2 Data types (raster and vector)
            2.2.1 Raster data
            2.2.2 Vector data
        2.3 Coordinate systems and projections
            2.3.1 Coordinate systems, projections and datums
            2.3.2 National grid systems
            2.3.3 The UTM system
            2.3.4 Local surveying systems
        2.4 GIS data formats
            2.4.1 ESRI Shapefiles
            2.4.2 GeoTIFF
            2.4.3 Tabular ASCII
            2.4.4 DXF 
        2.5 Topology
        2.6 Thematic maps
    3 Using QGIS and the GRASS plugin
        3.1 Importing data into QGIS
        3.2 Starting GRASS from within QGIS
    4 Basic GRASS GIS usage
        4.1 Using the GRASS shell
        4.2 How to use a GRASS command
        4.3 The GRASS region
        4.4 The mapset search path
        4.5 Database file commands
        4.6 GRASS environment variables
        4.7 More things to know about GRASS
            4.7.1 Maps and where they are stored
            4.7.2 Raster data handling
            4.7.3 Vector data handling
        4.8 Importing data into GRASS
        4.9 Georeferencing data
        4.10 Point data interpolation
        4.11 Exporting data from GRASS
        4.12 Basic data manipulation
            4.12.1 Masking
            4.12.2 Buffering
            4.12.3 (Re-)classification
            4.12.4 Spatial selection, overlay and extraction
            4.12.5 Data type conversion
            4.12.6 Map Algebra
            4.12.7 Volume calculations
            4.12.8 Profiling
    5 Common applications of GRASS GIS
        5.1 Map statistics
        5.2 Digital terrain models
        5.3 Cost surfaces
        5.4 Line-of-sight analysis
        5.5 Advanced interpolation
        5.6 Geometric modelling
    6 Advanced data processing with GRASS GIS
        6.1 More on interpolation
        6.2 Network analysis
        6.3 Database storage
            6.3.1 Attributes in an external DBMS
            6.3.2 Geometries in an external DBMS
        6.4 Environmental modelling
            6.4.1 Erosion models
    7 GRASS extensions for archaeology
        7.1 Territorial modelling with the Xtent model
        7.2 Stratigraphic modelling with voxels
        7.3 Cumulative viewshed analysis
        7.4 Predictive models for site locations
            7.4.1 Calculating model gain
    8 Script-based data analysis with GRASS GIS
    9 3D visualisation with ParaView
    10 Further directions
        10.1 Data storage in a relational DBMS
        10.2 WebGIS and GRASS web services
        10.3 Simulation models
        10.4 Spatial statistics with GRASS and R
    A Appendix
        A.1 GRASS commands index
        1.2 Internet links

1 GRASS and QGIS for archaeologists

Geographic Information Systems (GIS) are the only known 
software capable of adequately representing and 
processing spatial (geographic) information. GIS 
combine spatial mapping and analysis technoology with a 
database for storing any additional information about 
objects of interest. Simply put, if your archaeological 
work involves mapping and drawing conclusions from the 
spatial structure of things, you need to know how to 
use a GIS. There are many (and very expensive) GIS 
systems available today. In this course, I will 
introduce you to some software that is available to 
anyone at no cost and without restrictions.

I asssume that you have some basic knowledge of 
geoinformation systems, how they work and what you can 
do with them. For a general introduction to GIS 
technology and its uses in archaeology, I recommend 
these two books:

Conolly, J and Lake M, 2006. Geographical Information 
Systems in Archaeology. Cambridge: Cambridge University Press.

Wheatley, D. and Gillings, M, 2003. Spatial technology 
and archaeology : the archaeological applications of 
GIS. London: Taylor & Francis.

GRASS (Geographic Resources Analysis Support System, 
see [http://www.grass.itc.it]) GIS is a very complete workstation type GIS. It 
has been designed with complex, automated model 
building and data analysis tasks in mind. It handles 
quite differently compared to a typical desktop GIS 
(e.g. ArcView, Manifold, MapInfo) where the user spends 
most of the time interactively searching for the right 
icons and menu entries in the graphical user interface 
while the software stands idly by.

QGIS (Quantum GIS, see [http://qgis.org]), on the other hand, is just 
such a classical desktop GIS that allows you to quickly 
visualize data, query it, produce pretty map and 
digitize new geographic objects. QGIS also has a plugin 
that allows it to communicate with GRASS directly. This 
way, you get both a simple-to-use desktop GIS and a 
powerful workstation GIS to cover all your spatial data 
processing needs!

Both GRASS and QGIS are open source projects. This 
means you are free to copy, share and change the 
software in almost any way you like (but check the 
included license files for details). Check out this 
great online repository for more open source GIS 
software and data: [www.freegis.org].

GRASS was originally designed to be used on Unix

UNIX is a registered trademark in the United States and 
other countries, licensed exclusively through X/Open 
Company Ltd.
 or Unix-like operating systems (which also includes 
the popular Linux open source system). It does not run "natively"
 on a Windows

Windows seems to be a registered trademarks of 
Microsoft Corporation.
 system without some quirks and irritations. The 
Windows version of GRASS that we will use in this 
course needs some extra software to be installed on 
your Windows box, feels a bit strange to the non-Unix 
user and lacks some of the functionality that the 
full-blown Unix version has. However, I believe that 
almost all the misssing features are either not very 
relevant or can be replaced by QGIS or another open 
source Windows software. In the future, the two 
versions will approach each other and converge to 
exactly the same functionality.

There are also some very up-to-date Mac OS X

Mac OS is a registered trademark of Apple Computers, Inc.
 version of GRASS GIS, maintained by Lorenzo Moretti 
and William Kyngesburye: [http://www.grass.itc.it/grass61/binary/macosx/]. It is easy to set up GRASS 
on your Mac but you need to also install some 
additional software from you Mac OS X installation 
media. See the instructions on. I will not go into the 
details of installing and using this version of GRASS here.

GRASS can, of course, also be downloaded and compiled 
in source code form. In addition, there are binaries 
for a number of Linux versions. Like the Mac version, I 
will not discuss installation and usage of GRASS on 
Linux systems. You can find all different version here: [http://www.grass.itc.it/download/index.php].

Further tutorials, sample data and other resources can 
be found on [http://www.grass.itc.it/gdp/index.php] and [http://www.grass.itc.it/download/data.php].

GRASS and QGIS have an active users and developers 
community. Mailing lists for GRASS can be found on [http://www.grass.itc.it/community/support.php] and 
for QGIS on [http://qgis.org/index.php?option=com_content&task=view&id=115&Itemid=96]. Do not hesitate to post questions or 
comments to one of those lists. If you have time to 
help in the development of the software, feel free to 
join. You will be surprised in how many ways you can 
help to further develop software, that do not even 
require you to be a programmer. You can spell-check or 
correct faulty and badly written manual pages, design 
pretty icons, write tutorials, test and report program 
functionality, supply nice sample data, help to raise 
money for funding some programmers etc.

1.1 Installation<sub:Installation>


NOTE: The current version of the software described 
here suffers from an annoyance that prevents the QGIS 
GRASS plugin from working properly if you installed 
GRASS and QGIS in a path on your filesystem that has 
spaces in it. E.g. C:\Program Files\grassqgis is a 
folder that will not work, while C:\programs\grassqgis 
will work!]

For this course, I have put together a version of GRASS 
that includes QGIS and the QGIS GRASS plugin. The 
software is quite current but still in a somewhat 
experimental stage (especially QGIS). Do not let this 
put you off. Everything is quite powerful and usable 
already, despite certain rough edges and annoyances. 
Work on GRASS and QGIS is progressing very fast. I 
intend to make updates to the software available as 
time allows. Check this webpage regularly: [http://www.uni-kiel.de/ufg/ufg_BerDucke.htm].

The software package we will be using is based on Radim 
Blazek's work and instructions (see [http://wiki.qgis.org/qgiswiki/BuildingWindowsBinaryOnLinux#head-1af01d3d23f7cdba8b8e6817dd9198166f5ca481] for details if you 
want to know how to build your own system from 
scratch). Installing this version of GRASS and QGIS is 
very simple, although I do not have a graphical 
installer ready for you, yet. You only need a Windows 
2000 or XP system where you can login with 
administrator rights. The following instructions (as 
well as the rest of this text) assume that you install 
everything into the exact same folders on your 
harddisks as I am proposing here.

For the installation, you have two choices: if you want 
to install the software on a Windows system that has 
several different user accounts and different people 
should be able to use the software on that machine, do 
the multi user installation. Currently this requires 
you to have some knowledge about how to set file system 
permissions on a Windows system.

If you just want to install the software for your own 
use right now and the easy way, do the single user 
installation as described in the next section.

1.1.1 Single user installation

1. Log in as Administrator on your Windows system. If 
  your regular Windows user account has administrator 
  rights (a dangerous thing but common on many Windows 
  default installations), you can skip this step.

2. Download the file grassqgis-libs.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis-libs.zip].

3. Unzip the file to some directory on your harddisk. 
  If you have a Windows version that does not have 
  support for zip archives built-in and need a software 
  to unzip the files, I recommend you download and 
  install IZArc from [www.izarc.org].

4. After unziping, you will find a folder system-libs 
  in whatever location you decided to unzip to. Open 
  the system-libs folder and copy the files you find in 
  there (several .dll files) to a location where 
  Windows looks for system files. This is usually 

5. If you had to log in as Administrator: Log out now 
  and log in again with your usual user name and password.

6. Download the file grassqgis.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis.zip].

7. Unzip the file to some directory on your harddisk 
  where you have write access and that does not contain 
  any spaces in its name or the name of any folders 
  leading up to it!

8. After unziping, you will find a folders grassqgis in 
  whatever location you decided to unzip to. This 
  contains a program qgis.exe which you can use to 
  start qgis and GRASS. If you want, you can create a 
  link to qgis.exe on your Windows desktop to be able 
  to locate and launch it quicker.

1.1.2 Multi user installation

1. Login as Administrator on your Windows system. If 
  your regular Windows user account has administrator 
  rights (a dangerous thing but common on many Windows 
  default installations), you can skip this step.

2. Download the file grassqgis-libs.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis-libs.zip].

3. Unzip the file to some directory on your harddisk. 
  If you have a Windows version that does not have 
  support for zip archives built-in and need a software 
  to unzip the files, I recommend you download and 
  install IZArc from [www.izarc.org].

4. After unziping, you will find a folder system-libs 
  in whatever location you decided to unzip to. Open 
  the system-libs folder and copy the files you find in 
  there (several .dll files) to a location where 
  Windows looks for system files. This is usually 

5. Download the file grassqgis.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis.zip].

6. Unzip the file to some directory on your harddisk. 
  If you have a Windows version that does not have 
  support for zip archives built-in and need a software 
  to unzip the files, I recommend you download IZArc 
  from [www.izarc.org].

7. After unziping, you will find a folder grassqgis in 
  whatever location you decided to unzip to. Take the 
  grassqgis folder and move it to a folder that does 
  not contain any spaces in its name or in the name of 
  any folder leading up to it. You can also just move 
  grassqgis to C: or wherever you like.

8. You now have to adjust access permissions on the 
  grassqgis folder so that normal users will be able to 
  open the folder and launch qgis.exe. You also need to 
  allow normal users to create new folders under 
  C:\Program Files\grassqgis\msys\home.

9. If you want, you can create a link to qgis.exe in 
  the Windows Start menu to allow users to locate and 
  launch it quicker.

That's all you need to do to install QGIS and GRASS on 
your system! If you are still logged in as 
Administrator, log out now and log in again using your 
regular work account.

1.1.3 Installing tutorial data

Next, we will download some data sets that we will use 
in this course. Please note that you are only allowed 
to use this data as tutorial data for this course and 
not to publish it in any way!

1. Download the file import.zip from [http://www.uni-kiel.de/ufg/dateienDucke/import.zip]. This archive 
  contains data in various GIS and other formats that 
  we will use in the sections on importing GIS data 
  into QGIS and GRASS.

2. Download the file grassdata.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassdata.zip]. This archive 
  contains several complete GRASS locations with 
  example data we will use in the sections on data 
  processing with GRASS GIS.

3. Unzip both files and copy the folders you got to the 
  folder where you keep your own Documents. Usually, 
  this will be the "My Documents" folder on your desktop 
  (but you can copy everything to wherever you usually 
  put your documents; there is no issue with spaces in 
  the folder name here). For the rest of this course, I 
  will assume that the folders import and grassdata 
  reside in your "My Documents" folder.

  Additional notes

There is at least one viable alternative for running 
GRASS on Windows which involves using the Cygwin 
extensions. However, this needs a lot of additional 
software to be installed on your Windows box to 
essentially create a complete Unix environment for 
GRASS to run in. If you feel that you have the time and 
space for this, check out the instructions on this 
page: [http://geni.ath.cx/grass.html]. We will not be using that version in our 
course. It handles almost exactly like a Linux 
installation of GRASS so you can consult any GRASS 
tutorial on the web for details on how to use it.

QGIS is a localized software. this means, that all 
program menus, messages etc. will pop up in the 
language of your windows installation. This tutorial is 
written for the English version of QGIS. If this gives 
you trouble, locate the language file for your system 
in C:\Program Files\grassqgis\share\qgis\i18n and 
delete or move it to another place on your file system. 
E.g., if you are on an Italian system, get rid of 
qgis_it.qm, restart QGIS and everything will be in English!

2 Basic GIS concepts

For your convenience, however I will review some basic 
GIS concepts and terminology at this point.

An excellent on-line tutorial for archaeological GIS 
usage is available on[http://ads.ahds.ac.uk/project/goodguides/gis/index.html]. Browse there index for more 
reading material: [http://ads.ahds.ac.uk/project/goodguides/g2gp.html]. There are many excellent 
geographical GIS texts and instructions on the web. I 
personally like this one: [http://erg.usgs.gov/isb/pubs/gis_poster/]. Or browse this index for 
more [http://www.coloradocollege.edu/ats/labs/GIS/learning_teaching.htm].

2.1 Layers

A QGIS-Project (like any GIS project) consists of a 
number of layers. These layers will be shown in the box 
at the left side of the QGIS window, which is just 
blank right now. They are stacked on top of each other 
in order of display. Each layer has the following properties:

* it has a data type

* it refers to a GIS data file in a specific GIS data 
  format on your harddisk

* it has a symbology (display style) that determines 
  its color scheme, transparency etc

* it contains data, whose values can be queried

2.2 Data types (raster and vector)

GIS systems can integrate data from many different 
sources: scanned maps, field measurements, laser 
scanned surfaces, airphotos, database records etc. As 
long as something has coordinates (or you can 
reconstruct them), it can go into the GIS! Different 
forms (data types) are suitable for different types of 
information. Basically, in a GIS we have raster data 
for imagery and other rasterized information and vector data
 in the form of points, lines and polygons.

2.2.1 Raster data

Imagine putting an orthogonal grid of regularly spaced, 
rectangular cells over the real world. In the center of 
each cell, you take a measurement of the variable that 
you are interested in, e.g. the height of the terrain. 
The value of this measurement gets stored in the cell 
and the whole cell is painted in some color that 
represents this value (e.g. "blue" for everything on or 
below 0 meters to represent water and "white" for 
everything above 2000 meters for mountain tops, with 
brown, green and yellow for the ranges in between). You 
have created a raster model of elevation.

Raster data also means image data (pictures). A digital 
camera captures a rasterized model of the real, visible 
world by storing a colour value in each raster cell 
(also called a pixel in digital imaging software). In 
the case of image data, the cell colour directly 
reflects its measured value, i.e. the visible light's 
color colour, hue and saturation at his point. Image 
data conveys a lot of information for the human eye, 
but is meaningless for a computer, unless it is 
analyzed for structures and classified. Classification 
means to do things such as determining the type of 
vegetation on a satellite image and coding the cell 
with a class value that represents this vegetation 
(such as "1" for eucalyptus trees, "15" for grassland 
etc.). A class (sometimes also called a category or bin
) is an integer number, that is a whole number with no 
decimal part (-1, 0, 2, 20013 etc.). A class is a value 
taken from a limited range of possible values (e.g. 100 
different tree species to be identified on the 
satellite image). Classes can be counted (in other 
words, they are discrete values) and simple statistics 
can be made from the counted frequencies. That's all. 
Classes can have labels to give them a more intuitive 
meaning (e.g. 1="dense", 2="medium", 3="light").

On the other hand, raw measurement data, such as 
elevation, can (in theory) be measured and stored with 
an arbitrary degree of precision (e.g. 10.1232 meters) 
and have an infinite number of different values 
(so-called continuous values), depending on where you 
measure and how exactly, such values are called 
floating point numbers. Classified (integer) raster 
maps can be produced from floating point maps by 
defining ranges that map them to discrete classes (e.g. 
1.0 m to 100.0 m=1="low"; 100.01 m to 1000.0 m=2="medium"; 
1000.01 m to 2000.0 m=3="high").

Raster data typically completely covers a rectangular 
area of a certain extent/coverage (although there might 
be patches with no information, represented by 
so-called NULL or no data cells, e.g. at points where 
it is impossible to take a measurement). Think of 
airphotos or satellite imagery. The level of detail in 
raster data is directly connected to its ground resolution
, i.e. the number of cells in the grid. Obviously, the 
smaller the cells, the more detailed the raster model 
of the real world phenomenon will be. Raster resolution 
is limited by space constraints. If you double the 
resolution (i.e. half the size of the cells), you end 
up with four times the space requirements, as the data 
volume grows in two dimensions and 2^{2}=2\times2=4. Any processing and 
analysis of the data will also take four times as long.

In a raster data model, the information we are 
interested in, is usually stored directly in the cells. 
Thus, for digital terrain modelling, we would have one 
raster layer for heights, one for slope etc. This is 
entirely different with vector data, where attribute 
tables are used to save all sorts of information 
relating ot the objects in just one vector layer (see 
next section).

Frequent examples of raster data layers in a GIS are:

* continuous raster data (also called fields in the GIS 

  - digital elevation models with one height 
    measurement for e.g. each square meter. And maps 
    derived from them such as slope and aspect maps

  - density and concentration measurements interpolated 
    from point samples (e.g. phosphate concentration).

  - geophysical sensor data

  - airphotos and satellite images with real or false colors

* discrete raster data:

  - grayscale imagery (with a limited number of gray 
    shades, usually 256).

  - black and white scans.

  - scanned topographic paper maps with a limited 
    number of colors representing objects, such as 
    fields, woods, roads, railways etc.

  - scanned classified geologic and land use maps

2.2.2 Vector data

Vector maps show discrete geometric objects defined by 
arbitratrily precise spatial coordinates. These objects 
are sometime also called features, but I will use the 
term feature exclusively (with a few exceptions where 
noted) to describe objects as part of the stratigraphy 
of archaeological an site.

The shape of all objects in a vector map is simply 
described by a set of coordinates. However, there is a 
difference in how these coordinates (also called nodes 
or vertices) are connected. If there is no connection 
between any two coordinates, we have a point. If at 
least two pairs of coordinates are connected (by a 
straight line called an arc or edge -- we will not 
consider curves) we have a line. If we have a line with 
more than two vertices (a polyline) where the last pair 
of coordinates is connected to the first than we have a 
closed area, also called a polygon. Points only have 
positions, lines have lengths and polygons also have an 
area. Obviously, you can decompose a polygon into a 
polyline by breaking one connection and a line into 
points by breaking all connections. For this reason, 
when you start digitizing a new map in a GIS you need 
to tell the system wether you are going to produce 
points, line or polygons. Different types are usually 
kept in separate layers in a GIS to make processing easier.

Obviously, the different types have different 
topological properties that give you different options 
for analysis. Point data can, e.g. be queried to check 
if points lie within specific polygons in another map.

The level of detail in vector data, and thus the kind 
of questions you can answer with it, is hard to guess 
from its visual appearance alone, as you can zoom in 
and out of a vector map without visible loss of 
quality! You need to have some background information, 
such as the scale of the paper map from which the data 
was digitized.

Objects on vector maps typically have attribute data 
attached to them. Attribute data comprises all sorts of 
information about an object. E.g. for the polygon 
representing an archaeological feature, this could be 
the name of the excavator, the date of discovery, an 
inventory number etc. All attribute values for the 
objects in a single GIS layer/data file are stored in 
one attribute table with one record (row of data) for 
each object. Attribute tables can be stored in files 
(e.g. shapefiles store them in dBase format by default) 
or in a separate, full-blown database. In any case, 
there needs to be some way to link each geographic 
object to the specific record in the database table 
that contains its attribute data and this is called the 
primary key in database lingo. This primary key must be 
a simple integer type value which has to be unique for 
every object. In some cases this is easy to define. 
E.g. for an archaeological feature we could use its 
inventory number as primary key. 

Examples for the different types of vector data:

* point data

  - GPS coordinates

  - artefact positions

  - total station measurements

  - soil samples

* line data

  - contour lines

  - roads

  - railways

* polygon data (areas)

  - digitized feature outlines

  - trench outlines

  - all sorts of areas


A good GIS also handles vector topology. Topology 
describes spatial relations between geometric objects. 
A typical example is a sketch of two features with a 
common border. In a non-topological system (such as CAD 
or a very bad GIS), this border would be saved twice 
and the features overlap. In a real GIS, the 
information that it is a shared border will be recorded 
explicitely and the line will exist only once. Correct 
topology is very important for data analysis. There is 
much more to topology in a GIS and you will not have a 
lot of trouble to google up a wealth of information. 
See, e.g., this whitepaper: [http://www.esri.com/library/whitepapers/pdfs/gis_topology.pdf] or this paper, which is 
easier to understand [http://www.coloradocollege.edu/dept/SW/GIS_Lab/tips&tricks/Understanding_Topology.doc]. [

Need to talk about topology problems, like overlap, 
overshoot etc. and the potential of topology for 
spatial checks and queries]

  Interpolation to raster

Very often, you will want to create complete raster 
models from a cloud of vector points, e.g. a digital 
elevation model from a set of points measured in with a 
total station. For this you need to reconstruct the 
values at each grid cell that does not contain a 
measured value from the closest measured values in its 
neighbourhood. This process is called interpolation and 
is a very frequent operation in a GIS. We will look at 
it in more detail later.

  Additional notes

Practice will teach you which data type is best for 
representing specific information. Most often, the data 
provider will make the choice for you. There are cases 
where several choices are possible and the best one 
depends on what you want to do with the data. E.g. you 
can scan a topographic map and import it into a GIS 
project just like that or you can digitize the outlines 
of all visible fields, roads, etc. on that map to use 
the vectorized data.

Basically, if you want to calculate a value for a every 
location of your map and you are comfortable with 
dividing your geographic regions into a regular grid of 
cells for this (e.g. the density of artefacts for every 
square meter of a trench), you are producing raster 
data and need all input in rasterized form.

If you want to calculate a value for every object 
visible on your map (especially geometric values, e.g. 
the area of each feature on a site map) or are 
interested in attribute data linked to objects (such as 
a database record which describes the contents of 
features on a site map), you want to use vector data.

You can convert between raster and vector formats in 
both directions, and also between different types of 
vector geometries (point, line, polygon). It is a 
common thing to e.g. scan in a paper map and then trace 
the contour lines on-screen to create a map of vector 
lines, each one with an attribute that stores the 
height information for that line. These lines will 
consists of nodes (the points where you clicked the 
mouse on the screen) and arcs that connect them. If you 
want to turn the height information in the contour 
lines into a smooth continuous raster model for e.g. 
pretty visualizations or visibility calculations, you 
can extract the nodes into a layer of points with 
height attributes and run an interpolation to create 
the raster model (we will see how to do this basic 
operation later).

Unfortunately, some GIS talk about grids (or gridded data
) when they really mean raster data, although a grid 
might just as well mean a vector map of rectangles 
(e.g. such as is usually overlayes on printed 
topographic maps). Gridded data is real point vector 
data where each point is positioned on an imaginary 
regular grid.

Both raster and vector data representations can be 
extended to three dimensions. For vector data, this is 
straightforward, as all that is necessary is to record 
3D-coordinates (X,Y and Z). Raster cells can also be 
extended to 3D. Threedimensional cells are called voxels
 (volume pixel). Think of Lego bricks and you get the idea.

Current GIS technology is predominantly 2D with the 
possibility to visualize 3D data using separate 
viewers. The software we are using for this course does 
not currently have any 3D visualization capacity, but a 
full installation of GRASS (on Linux, Mac OS etc.) has 
a software called nviz for this purpose. As a Windows 
replacement, we will use ParaView which also works 
great (see [sec:3D-visualisation-with]).

For 3D raster (voxel) space is an even harder 
constraint to resolution than in the 2D raster case, as 
the data volume grows in three dimension. Doubling the 
resolution gives you 2^{3}=2\times2\times2=8times as much data!

2.3 Coordinate systems and projections

Every object in a GIS has a place on the real Earth. If 
you want to be able and find it, you need to make use 
of one of the many geodetic reference systems in use 
across the entire globe. A geodetic reference system 
consists of a coordinate system which defines how 
coordinate values are to be interpreted and projection 
information which defines all sorts pecularities that 
are necessary to map geographic data nicely in a 
specific part of the world. In addition there are many 
details that can driven even the most experienced GIS 
expert crazy and lend room to a plethora of error 
sources. The following is a very concise text that 
contains just the bare essentials you need to know to 
correctly handle your GIS layers. The internet offers 
lot of detailed information. A nice page can be found 
here: [http://gis.washington.edu/esrm250/cfr250/lessons/projection/index.html].

Never try and overlay geographic data from sources with 
different geodetic reference systems without a basic 
understanding of the following concepts. You will get 
wrong results that are not always easily visible! Old 
maps (even just a decade old) may use other reference 
systems than modern data!

2.3.1 Coordinate systems, projections and datums

Let's first get the basic terminology straight. There 
is a lot of confusion about this and even GIS sofware 
messes up a lot of it. Therefore, I will try to be as 
brief as possible.

  Geographic coordinate systems

Geographic coordinates are really the only precise way 
of locating anything on the Earth's curved surface. 
Geographic coordinates are angular readings (longitude 
and lattitude) that allow for the precise definition of 
any point on the Earth's surface.

Geographic coordinates can either be given in degrees, 
minutes and seconds or as decimal degrees. It is easy 
to convert between the two and instructions can be 
found, e.g. on this page: [http://www.warnercnr.colostate.edu/class_info/nr502/lg1/notes/dms_and_dd.html]. Just make sure you know 
which type your data uses.

Unfortunately, geographic coordinates are not very 
handy for anything to do with 2D mapping and data 
analysis. Even calculating the distance between two 
points along the Earth's surface is a pretty complex 
operation. Thus, we really want coordinates to be meter 
readings in a planar (cartesian) coordinate system.

  Cartesian coordinate systems

Cartesian coordinate systems are exactly what you know 
from your math class in school: a planar coordinate 
system with two orthogonal axes: X (or easting) for the 
horizontal position and Y (or northing) for the 
vertical position (this can also be extended into 3 
dimensions by adding a Z axis). The point where the 
axes meet is called the origin (with coordinates 0/0 or 
0/0/0 in the 3D case). X coordinates grow towards East 
and Y coordinates grow towards North (Z coordinates 
grow towards the zenith). In geography (and therefore 
GIS) this has always been so and will always stay that 
way. X coordinates are always written down first, then 
Y, then Z. Never ever record coordinates in any other 
way, especially when setting up a local survey (see 
below). Unfortunately, some archaeologists (and survey 
instruments) are not aware of this and measure things 
in a Y/X system. Be sure to swap such coordinates for 
use in a GIS!


Earth is not flat. Therefore when we measure anything 
on the Earth's surface and want to map (project) it 
onto a flat (planar) paper map or computer screen, we 
will distort the original shapes. There is no way 
around it. Obviously this effect will be more 
noticeable when things are measured for a big area, 
such as a whole country. Thus a compromise will have to 
be made to keep distortions low at least in those parts 
that are of immediate interest. The mathematical set of 
rules to achieve this is called projection information 
and you need to know it if you want to work with data 
in national grid system (see below). There are 
different types of projections which are optimal for 
different mapping and analysis purposes but the most 
frequently used on is the transverse mercator projection.


I will just say this much: a datum is information you 
need if you want to use geographic data that has a 
different projection than your working region (see 
definition of reprojection below). There are datums 
with different numbers of parameters. Try to get the 
one with the highest number of parameters and use that, 
it will be the most accurate.


>From 3 to 7?]


Often, you need to import data into your GIS project 
that does not have any coordinates. The process of 
assigning coordinates to such "floating" objects is 
called georeferencing. There are different ways to do 
this, depending on the sort of input data. Examples:

* A picture of an archaeological site was taken from an 
  airplane. In the picture, you can identify a number 
  of fixed points for which you know the real world 
  coordinates. By mapping pixel coordinates in the 
  digital photo to the real world coordinate, you 
  georeference the image. Also, if a sufficient number 
  of points is given, distortions in the image due to 
  skwed angles, lens distortions etc. can be 
  compensated for and the image will be rectified.

* You have surveyed a number of points using a total 
  station on an excavation working in a local system. 
  Now, you have found some fixed points in the national 
  grid and surveyed those, as well. By mapping the 
  local coordinates to the fixed point national grid 
  coordinates, you can transform all your previous 
  total station measurements to national grid 
  coordinates (in the case of vector data, this process 
  is also called rubbersheeting).

* In the process of an excavation, artefacts were 
  collected in square meter units (quadrants) within a 
  trench (don't do this in reality...) to get an 
  approximate mapping of artefact densities, you decide 
  to count the artefacts from each square meter and 
  represent this value as a point measurement by 
  assigning it the coordinates of the respective quadrant.


Sometimes, you will need to merge data with a different 
projection (e.g. from UTM data from a GPS) into you 
working location. This process is called reprojection 
or projection transformation (if you convert between 
different types of projection). Most GIS have a 
database with projection information for most of the 
world's known countries, so you will have no trouble 
doing this. In some cases, you will need to get 
additional information (such as a datum with more 
parameters) for optimal accuracy.

2.3.2 National grid systems

National mapping agencies need to map their country's 
topography onto planar maps in order to be able to 
easily calculate distances etc. As was mentioned 
before, projecting points on the Earth's surface to a 
planar system causes distortions.

Exactly because of this, every country uses an 
individual projection that minimizes distortion within 
its own borders. In addition, countries that stretch a 
long way from West to East will not be able to minimize 
distortion effectively over the whole length. 
Therefore, countries are divided into vertical stripes 
with different projection parameters for each of them. 
The origin of the cartesian coordinate system also 
varies for every national grid systems, there are 
individual datums for precise projection 
transformations and sometimes other intricacies. Be 
sure to know them all before you work with national 
grid data, especially if you have to merge it with data 
from other systems, such as UTM.

An excellent source of information about the world's 
national grid systems is [www.asprs.org/resources/grids].

2.3.3 The UTM system

The UTM systems was really designed as a globe-spanning 
reference system for satellite navigation. Every GPS 
records data in this system by default. UTM works just 
like a national grid system (i.e. coordinates are 
planar and in meters), but it crosses all national 
boundaries. In order to keep distortions at a passable 
level, 32 vertical stripes have been defined. 
World-wide data (such as the SRTM data: [http://srtm.usgs.gov/]) is often in 
UTM format. That's OK because every good GIS can 
convert UTM to a national grid system or to geographic 
coordinates (and vice versa).

2.3.4 Local surveying systems

Lots of people (especially archaeologists) have to work 
in places where information about the national grid 
system and fixed point positions can be hard or 
impossible to get. Even European countries sometimes 
treat this information as a military secret. The 
Republic of Poland, e.g., has only started to release 
information about its national grid system to the 
public in the year 2000! In such cases, surveying will 
have to fall back to a local system: an origin is set 
somewhere convenient and points are measured in 
relation to that origin.

This is why you will get data in such systems very 
often. Archaeological work usually happens on a scale 
that does not make projections necessary, as the 
warping effect of the Earth's curvature will be 
negligible. For this sort of data, many GIS know 
non-world (MapInfo), x-y (GRASS) or otherwise 
unprojected coordinate systems. Some GIS, however don't 
and working with such data will be a pain (e.g. ArcGIS 
where you must use the local projection as a sort of 
fake projection...); luckily GRASS can handle this sort of 
data just fine and we are OK!

Needless to say, the only chance of locating points 
surveyed in a local system is to find national fixed 
points later and use them to properly georeference everything.

2.4 GIS data formats

In part, this is to do with the fact that different 
formats have different capabilities and flexibility for 
storing geographic information. For the biggest part 
however, it is the result of commercial GIS makers 
trying to bind customers to their product. Today, the 
situation has improved insofar as an international 
standards organization (the OGC, URL) exists to ensure 
the development and continued usage of standard GIS 
file formats. 

While QGIS directly uses these standard formats, GRASS 
has its own formats for both raster and vector data. 
All data inside the GRASS database is stored in these 
native GRASS formats. This means that you must know how 
to import and export data to and from GRASS and we will 
look into this in section [??].

The basic formats, that almost any GIS system can 
handle (i.e. import and export) are:

2.4.1 ESRI Shapefiles

This has been a standard format for GIS vector data for 
many years. Almost any GIS can handle it. Shapefiles 
can store 2D or 3D vector objects. One shapefile can 
only have one type of data (points, lines or polygon). 
On your harddisk, a shapefile actually consists of more 
than one file. For each map, there are at least three 
files with the suffixes .shp (the actual coordinates of 
the geographic objects), .dbf (attribute data in a 
simple dBase format file) and .shx (file of indices to 
the shapes). Where applicable, additional files will be 
generated by the GIS, such .prj (to store projection 
information), .idx (a spatial index for faster display 
of objects) etc. Shapefiles do not store full topology 
information, which is why the inventor of the format 
(ESRI Corp.) calls them "simple feature" files.

2.4.2 GeoTIFF

Raster data looks just like pictures on screen and in 
many cases that's all people want: just a picture of 
the map to visualize in their GIS. For this reason, the 
trusty old TIFF picture file format was extended to 
also include some geographic positioning information 
(georeferencing) in an additional file with the 
extension .tfw, the so-called world file, a simple 
ASCII file (see below) that (actually, there are other 
ways of storing GIS related information in a GeoTIFF 
file, but we will not make things to complicated here). 
If you load such a TIFF file into a GIS, it will 
automatically recognize the referencing information and 
display the raster image at the correct geographic location.

Unfortunately, GeoTIFF, being essentially a picture 
file format, is not a very good choice for saving 
anything else than completely processed (image) data 
that does not need to be analyzed any further. It is 
like taking a 'snapshot' of your GIS raster map as 
displayed on screen. There are, however no other 
standard formats for GIS raster data and in some cases 
your only option may be to export every cell value as a 
point in tabular ASCII format (see below), import and 
then rasterize the data again.

2.4.3 Tabular ASCII

The acronym ASCII stands for American Standard Code for 
Information Interchange. ASCII files are very simple 
text files that can be read and written using the most 
primitive text editors or command line tools. Such 
files are often the output of measuring instrument 
software (e.g. GPS, total stations), databases, 
spreadsheets or manual typing. Tabular ASCII files are 
files that store simple point coordinates and (most 
often) additional information, such as measurements 
and/or labels using one record per line, like this:

45.0;12.4;70.3;"Point one"

In this case, we have a 2D point coordinate consisting 
of an easting, a northing, a height reading in meters 
and a point label, i.e. four fields of data. The data 
fields are separated by a so-called delimiter, in this 
case a semicolon but other delimiters such as tab stops 
or simple spaces are also common. The text label is 
enclosed with quotation marks to signify that this is 
text, not a numeric value. Such point data can be 
imported by any GIS. If not directly, you can import 
the ASCII file into a spreadsheet such as Microsoft 
Excel and export from there as a dBase file which can 
be read in by the GIS. In addition, most GIS systems 
offer their own special ASCII import and export formats 
for more complex data such as polygons. Such ASCII 
files will also be readable with a simple text editor, 
but only a GIS that can understand this special format 
will be able to import it successfully. Data in this 
format is sometimes also refered to as comma separated value
 (CSV) files, although a comma is just one choice (and 
not even a good one, as some languages use it also for 
a decimal point) for delimiting data fields!

Be aware that the importing software cannot make any 
sense of tabular ASCII data by itself! You have to tell 
it explicitely what each field means which separator to 
use etc. Every software the can import or export 
tabular ASCII files will have some way of letting you 
specifiy these options. Be sure that you keep the 
necessary information somewhere or you might get into 
trouble some time. A good idea is to write a comment 
about what each field means, where the data comes from 
etc. into the first line of the ASCII file. The usual 
practice is to prefix the comment line with a hashmark 
(#). Before you import the data, you can delete the 
comment line (after you made a copy of the original 
file) or the software may be smart enough to skip it.

2.4.4 DXF 

The Data eXchange Format that CAD software, such as 
AutoCAD, uses to import and export CAD drawings. Unlike 
shapefiles, DXF files can store different types of 
geometries in one file, i.e. the file itselft is 
multi-layered. CAD and GIS are two very different 
worlds, but DXF ist still the best bridge between them. 
DXF files (like CAD software in general) have no 
concept of geographic topology, as is true for many CAD 
users. Thus, you will often find that things which 
should be polygons (like the outlines of features) are 
polylines with lots of overlap and other topological 
problems that have to be cleaned in the GIS.

Most often, DXF files are actually ASCII files 
(although it is possible to also save them in 
human-unreadable binary format). You can open such a 
file with a simple text editor and you will see that 
you can indeed "see" the data. However, you will not be 
able to make very much sense of it unless you know all 
about the format of a DXF file, that is how to 
interpret the ASCII text. You can actually learn that 
(at least in theory), because the creators of the DXF 
format have documented their format (see [http://usa.autodesk.com/adsk/servlet/item?siteID=123112&id=5129239]).

DXF support is currently not very good in either QGIS 
or GRASS. Both systems can only import very simple DXF 
files reliably.

  Additional notes

The opposite of human-readable ASCII files are called 
binary files. They make sense to the software that 
created them, but certainly not to you (and not to 
other software that does not understand the particular 
binary format). Try opening a Word document (.doc) in a 
simple text editor and you will see what I mean.

It is not a good idea to create or edit an ASCII file 
with a full-blown word processor such as Microsoft Word 
as such software will save all sort of formatting 
information into the file and make in unreadable by 
other software. On Windows, you can use the program 
notepad to create abd edit ASCII text files, but if you 
work a lot with ASCII data, consider downloading a 
good, free editor for you system, such as PSPad [http://www.pspad.com/]to make 
your life a lot easier!

ASCII files are completely the same on Mac OS, Unix 
systems and Windows, except for one thing: the 
(invisible) special character that signals each line's 
end differs between systems! Good text editors can 
handle this, bad ones will give strange effects such as 
displaying the whole file on one line! 

ASCII is great for long-time archiving of data as you 
will always be able to read the data as long as you 
know how to interpret it, even if the maker of the 
software you used for creating it has gone bankrupt a 
long time ago.

ASCII files are usually very big as the 
human-readability demands all information to be written 
in the file explicitly. In some case, e.g. if you want 
to email an ASCII file to someone, you can compress it 
to a fraction of its size by using a file compression 
program (e.g. [www.izarc.org]). However, this means that the recipient 
also needs software to uncompress the file again. In 
addition, compressing files is not a good idea for 
long-term archival, as a single defect bit, e.g. caused 
by aging of the storage media, can make the whole 
compressed file unusable.

2.5 Topology


[This text taken from 

need to simplify and adapt to archaeology!!!]]

The topologic model is often confusing to initial users 
of GIS. Topology is a mathematical approach that allows 
us to structure data based on the principles of feature 
adjacency and feature connectivity. It is in fact the 
mathematical method used to define spatial 
relationships. Without a topologic data structure in a 
vector based GIS most data manipulation and analysis 
functions would not be practical or feasible.

The most common topological data structure is the 
arc/node data model. This model contains two basic 
entities, the arc and the node. The arc is a series of 
points, joined by straight line segments, that start 
and end at a node. The node is an intersection point 
where two or more arcs meet. Nodes also occur at the 
end of a dangling arc, e.g. an arc that does not 
connect to another arc such as a dead end street. 
Isolated nodes, not connected to arcs represent point 
features. A polygon feature is comprised of a closed 
chain of arcs.

In GIS software the topological definition is commonly 
stored in a proprietary format. However, most software 
offerings record the topological definition in three 
tables. These tables are analogous to relational 
tables. The three tables represent the different types 
of features, e.g. point, line, area. A fourth table 
containing the coordinates is also utilized. The node 
table stores information about the node and the arcs 
that are connected to it. The arc table contains 
topological information about the arcs. This includes 
the start and end node, and the polygon to the left and 
right that the arc is an element of. The polygon table 
defines the arcs that make up each polygon. While arc, 
node, and polygon terminology is used by most GIS 
vendors, some also introduce terms such as edges and 
faces to define arcs and polygons. This is merely the 
use of different words to define topological 
definitions. Do not be confused by this.

Since most input data does not exist in a topological 
data structure, topology must be built with the GIS 
software. Depending on the data set this can be an CPU 
intensive and time consuming procedure. This building 
process involves the creation of the topological tables 
and the definition of the arc, node, and polygon 
entities. To properly define the topology there are 
specific requirements with respect to graphic elements, 
e.g. no duplicate lines, no gaps in arcs that define 
polygon features, etc. These requirements are reviewed 
in the Data Editing section of the book.

The topological model is utilized because it 
effectively models the relationship of spatial 
entities. Accordingly, it is well suited for operations 
such as contiguity and connectivity analyses. 
Contiguity involves the evaluation of feature 
adjacency, e.g. features that touch one another, and 
proximity, e.g. features that are near one another. The 
primary advantage of the topological model is that 
spatial analysis can be done without using the 
coordinate data. Many operations can be done largely, 
if not entirely, by using the topological definition 
alone. This is a significant advantage over the CAD or 
spaghetti vector data structure that requires the 
derivation of spatial relationships from the coordinate 
data before analysis can be undertaken.

The major disadvantage of the topological data model is 
its static nature. It can be a time consuming process 
to properly define the topology depending on the size 
and complexity of the data set. For example, 2,000 
forest stand polygons will require considerably longer 
to build the topology that 2,000 municipal lot 
boundaries. This is due to the inherent complexity of 
the features, e.g. lots tend to be rectangular while 
forest stands are often long and sinuous. This can be a 
consideration when evaluating the topological building 
capabilities of GIS software. The static nature of the 
topological model also implies that every time some 
editing has occurred, e.g. forest stand boundaries are 
changed to reflect harvesting or burns, the topology 
must be rebuilt. The integrity of the topological 
structure and the DBMS tables containing the attribute 
data can be a concern here. This is often referred to 
as referential integrity. While topology is the 
mechanism to ensure integrity with spatial data, 
referential integrity is the concept of ensuring 
integrity for both linked topological data and 
attribute data.

2.6 Thematic maps

One of the basic functions of a GIS is to produce 
pretty, colored maps. A map that cleverly uses things 
like colors, transparency, different symbol sizes and 
shapes etc. to help you understand some spatial 
phenomenon is called a thematic map. E.g. a common 
theme for relief models includes using a typical range 
of colors to represent different heights and shading 
mountain sides to achieve a sort of pseudo 3D effect. 
Frequently, you will see chloropleths maps that use 
areas with different colors to e.g. symbolize different 
artefact densities in different trenches/quadrants.

3 Using QGIS and the GRASS plugin

4 Basic GRASS GIS usage

4.2 How to use a GRASS command

5 Common applications of GRASS GIS

6 Advanced data processing with GRASS GIS

6.1 More on interpolation

6.2 Network analysis

6.3 Database storage

6.3.1 Attributes in an external DBMS

6.3.2 Geometries in an external DBMS

6.4 Environmental modelling

6.4.1 Erosion models

7 GRASS extensions for archaeology

8 Script-based data analysis with GRASS GIS

9 3D visualisation with ParaView<sec:3D-visualisation-with>

10 Further directions

A Appendix

A.1 GRASS commands index


More information about the grass-user mailing list