[GRASS-user] Re: [Qgis-user] Basic GIS course ideas
Benjamin Ducke
benjamin.ducke at ufg.uni-kiel.de
Mon Oct 9 03:34:00 EDT 2006
I agree, a dedicated site for teaching open source GIS would
be the best idea, since there are so many components involved
here: GRASS, QGIS, R, PostgreSQL, ParaView, ...
I have attached a draft text about GRASS and QGIS for archaeologists.
It also contains a general introduction to GIS concepts and file
formats.
Obviously, it needs to be generalized, so that it applies to all
GIS users equalla, and there are still a lot of blanks to fill in.
But it's considerably more fleshed out than what is on the Wiki so
far. Maybe it's a start ...
Benjamin
Sören Gebbert wrote:
> Hi,
>
> Hamish schrieb:
>
>> Otto Dassau wrote:
>>
>>>>> But I do not have the time or resource to set up an appropriate
>>>>> website and administer this.
>>>>
>>>> The GRASS wiki site is a perfect place for this development.
>>>> (can it serve PDF, presentation attachments?)
>>>
>>> no, currently it doesn't support uploads, documents (pdf, images, ...)
>>> have to be stored somewhere else.
>>
>>
>>
>> create a quasi-liberal access area in gforce GRASS svn? or better,
>> create a new project on gforge for "open_gis_edu" or similar.
>> http://wald.intevation.org/projects/grass/
>
>
> Thats a great idea.
> Can we use this place to put video tutorials online?
> I just created a very simple gui feature demo with grass63
> to encourage all users and dev's to create video tutorials.
>
> http://www-pool.math.tu-berlin.de/~soeren/grass/modules/screenshots/grass63feature_tour.html
>
>
> It would be great to have many thematic video tutorials
> to explain the basics and advanced features of grass.
>
> Best regards
> Soeren
>
>>
>>
>>
>> Hamish
>>
>> _______________________________________________
>> grassuser mailing list
>> grassuser at grass.itc.it
>> http://grass.itc.it/mailman/listinfo/grassuser
>>
>>
>
> _______________________________________________
> grassuser mailing list
> grassuser at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser
>
>
--
Benjamin Ducke, M.A.
Archäoinformatik
(Archaeoinformation Science)
Institut für Ur- und Frühgeschichte
(Inst. of Prehistoric and Historic Archaeology)
Christian-Albrechts-Universität zu Kiel
Johanna-Mestorf-Straße 2-6
D 24098 Kiel
Germany
Tel.: ++49 (0)431 880-3378 / -3379
Fax : ++49 (0)431 880-7300
www.uni-kiel.de/ufg
-------------- next part --------------
GRASS and QGIS
Open Source Geoinformation Technology for Archaeologists
Abstract
This course is an introduction to the open source
geoinformation systems QGIS (Quantum GIS) and GRASS
(Geographic Resources Analysis Support System). It was
written for archaeologists with a minimum background in
GIS technology (and an idea what they want to use it
for).
Table of Contents
1 GRASS and QGIS for archaeologists
1.1 Installation
1.1.1 Single user installation
1.1.2 Multi user installation
1.1.3 Installing tutorial data
1.2 File system layout
1.2.1 The GRASS database
1.2.2 The GRASS home directory
2 Basic GIS concepts
2.1 Layers
2.2 Data types (raster and vector)
2.2.1 Raster data
2.2.2 Vector data
2.3 Coordinate systems and projections
2.3.1 Coordinate systems, projections and datums
2.3.2 National grid systems
2.3.3 The UTM system
2.3.4 Local surveying systems
2.4 GIS data formats
2.4.1 ESRI Shapefiles
2.4.2 GeoTIFF
2.4.3 Tabular ASCII
2.4.4 DXF
2.5 Topology
2.6 Thematic maps
3 Using QGIS and the GRASS plugin
3.1 Importing data into QGIS
3.2 Starting GRASS from within QGIS
4 Basic GRASS GIS usage
4.1 Using the GRASS shell
4.2 How to use a GRASS command
4.3 The GRASS region
4.4 The mapset search path
4.5 Database file commands
4.6 GRASS environment variables
4.7 More things to know about GRASS
4.7.1 Maps and where they are stored
4.7.2 Raster data handling
4.7.3 Vector data handling
4.8 Importing data into GRASS
4.9 Georeferencing data
4.10 Point data interpolation
4.11 Exporting data from GRASS
4.12 Basic data manipulation
4.12.1 Masking
4.12.2 Buffering
4.12.3 (Re-)classification
4.12.4 Spatial selection, overlay and extraction
4.12.5 Data type conversion
4.12.6 Map Algebra
4.12.7 Volume calculations
4.12.8 Profiling
5 Common applications of GRASS GIS
5.1 Map statistics
5.2 Digital terrain models
5.3 Cost surfaces
5.4 Line-of-sight analysis
5.5 Advanced interpolation
5.6 Geometric modelling
6 Advanced data processing with GRASS GIS
6.1 More on interpolation
6.2 Network analysis
6.3 Database storage
6.3.1 Attributes in an external DBMS
6.3.2 Geometries in an external DBMS
6.4 Environmental modelling
6.4.1 Erosion models
7 GRASS extensions for archaeology
7.1 Territorial modelling with the Xtent model
7.2 Stratigraphic modelling with voxels
7.3 Cumulative viewshed analysis
7.4 Predictive models for site locations
7.4.1 Calculating model gain
8 Script-based data analysis with GRASS GIS
9 3D visualisation with ParaView
10 Further directions
10.1 Data storage in a relational DBMS
10.2 WebGIS and GRASS web services
10.3 Simulation models
10.4 Spatial statistics with GRASS and R
A Appendix
A.1 GRASS commands index
1.2 Internet links
A.3
1 GRASS and QGIS for archaeologists
Geographic Information Systems (GIS) are the only known
software capable of adequately representing and
processing spatial (geographic) information. GIS
combine spatial mapping and analysis technoology with a
database for storing any additional information about
objects of interest. Simply put, if your archaeological
work involves mapping and drawing conclusions from the
spatial structure of things, you need to know how to
use a GIS. There are many (and very expensive) GIS
systems available today. In this course, I will
introduce you to some software that is available to
anyone at no cost and without restrictions.
I asssume that you have some basic knowledge of
geoinformation systems, how they work and what you can
do with them. For a general introduction to GIS
technology and its uses in archaeology, I recommend
these two books:
Conolly, J and Lake M, 2006. Geographical Information
Systems in Archaeology. Cambridge: Cambridge University Press.
Wheatley, D. and Gillings, M, 2003. Spatial technology
and archaeology : the archaeological applications of
GIS. London: Taylor & Francis.
GRASS (Geographic Resources Analysis Support System,
see [http://www.grass.itc.it]) GIS is a very complete workstation type GIS. It
has been designed with complex, automated model
building and data analysis tasks in mind. It handles
quite differently compared to a typical desktop GIS
(e.g. ArcView, Manifold, MapInfo) where the user spends
most of the time interactively searching for the right
icons and menu entries in the graphical user interface
while the software stands idly by.
QGIS (Quantum GIS, see [http://qgis.org]), on the other hand, is just
such a classical desktop GIS that allows you to quickly
visualize data, query it, produce pretty map and
digitize new geographic objects. QGIS also has a plugin
that allows it to communicate with GRASS directly. This
way, you get both a simple-to-use desktop GIS and a
powerful workstation GIS to cover all your spatial data
processing needs!
Both GRASS and QGIS are open source projects. This
means you are free to copy, share and change the
software in almost any way you like (but check the
included license files for details). Check out this
great online repository for more open source GIS
software and data: [www.freegis.org].
GRASS was originally designed to be used on Unix
UNIX is a registered trademark in the United States and
other countries, licensed exclusively through X/Open
Company Ltd.
or Unix-like operating systems (which also includes
the popular Linux open source system). It does not run "natively"
on a Windows
Windows seems to be a registered trademarks of
Microsoft Corporation.
system without some quirks and irritations. The
Windows version of GRASS that we will use in this
course needs some extra software to be installed on
your Windows box, feels a bit strange to the non-Unix
user and lacks some of the functionality that the
full-blown Unix version has. However, I believe that
almost all the misssing features are either not very
relevant or can be replaced by QGIS or another open
source Windows software. In the future, the two
versions will approach each other and converge to
exactly the same functionality.
There are also some very up-to-date Mac OS X
Mac OS is a registered trademark of Apple Computers, Inc.
version of GRASS GIS, maintained by Lorenzo Moretti
and William Kyngesburye: [http://www.grass.itc.it/grass61/binary/macosx/]. It is easy to set up GRASS
on your Mac but you need to also install some
additional software from you Mac OS X installation
media. See the instructions on. I will not go into the
details of installing and using this version of GRASS here.
GRASS can, of course, also be downloaded and compiled
in source code form. In addition, there are binaries
for a number of Linux versions. Like the Mac version, I
will not discuss installation and usage of GRASS on
Linux systems. You can find all different version here: [http://www.grass.itc.it/download/index.php].
Further tutorials, sample data and other resources can
be found on [http://www.grass.itc.it/gdp/index.php] and [http://www.grass.itc.it/download/data.php].
GRASS and QGIS have an active users and developers
community. Mailing lists for GRASS can be found on [http://www.grass.itc.it/community/support.php] and
for QGIS on [http://qgis.org/index.php?option=com_content&task=view&id=115&Itemid=96]. Do not hesitate to post questions or
comments to one of those lists. If you have time to
help in the development of the software, feel free to
join. You will be surprised in how many ways you can
help to further develop software, that do not even
require you to be a programmer. You can spell-check or
correct faulty and badly written manual pages, design
pretty icons, write tutorials, test and report program
functionality, supply nice sample data, help to raise
money for funding some programmers etc.
1.1 Installation<sub:Installation>
[
NOTE: The current version of the software described
here suffers from an annoyance that prevents the QGIS
GRASS plugin from working properly if you installed
GRASS and QGIS in a path on your filesystem that has
spaces in it. E.g. C:\Program Files\grassqgis is a
folder that will not work, while C:\programs\grassqgis
will work!]
For this course, I have put together a version of GRASS
that includes QGIS and the QGIS GRASS plugin. The
software is quite current but still in a somewhat
experimental stage (especially QGIS). Do not let this
put you off. Everything is quite powerful and usable
already, despite certain rough edges and annoyances.
Work on GRASS and QGIS is progressing very fast. I
intend to make updates to the software available as
time allows. Check this webpage regularly: [http://www.uni-kiel.de/ufg/ufg_BerDucke.htm].
The software package we will be using is based on Radim
Blazek's work and instructions (see [http://wiki.qgis.org/qgiswiki/BuildingWindowsBinaryOnLinux#head-1af01d3d23f7cdba8b8e6817dd9198166f5ca481] for details if you
want to know how to build your own system from
scratch). Installing this version of GRASS and QGIS is
very simple, although I do not have a graphical
installer ready for you, yet. You only need a Windows
2000 or XP system where you can login with
administrator rights. The following instructions (as
well as the rest of this text) assume that you install
everything into the exact same folders on your
harddisks as I am proposing here.
For the installation, you have two choices: if you want
to install the software on a Windows system that has
several different user accounts and different people
should be able to use the software on that machine, do
the multi user installation. Currently this requires
you to have some knowledge about how to set file system
permissions on a Windows system.
If you just want to install the software for your own
use right now and the easy way, do the single user
installation as described in the next section.
1.1.1 Single user installation
1. Log in as Administrator on your Windows system. If
your regular Windows user account has administrator
rights (a dangerous thing but common on many Windows
default installations), you can skip this step.
2. Download the file grassqgis-libs.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis-libs.zip].
3. Unzip the file to some directory on your harddisk.
If you have a Windows version that does not have
support for zip archives built-in and need a software
to unzip the files, I recommend you download and
install IZArc from [www.izarc.org].
4. After unziping, you will find a folder system-libs
in whatever location you decided to unzip to. Open
the system-libs folder and copy the files you find in
there (several .dll files) to a location where
Windows looks for system files. This is usually
C:\WINDOWS\system32.
5. If you had to log in as Administrator: Log out now
and log in again with your usual user name and password.
6. Download the file grassqgis.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis.zip].
7. Unzip the file to some directory on your harddisk
where you have write access and that does not contain
any spaces in its name or the name of any folders
leading up to it!
8. After unziping, you will find a folders grassqgis in
whatever location you decided to unzip to. This
contains a program qgis.exe which you can use to
start qgis and GRASS. If you want, you can create a
link to qgis.exe on your Windows desktop to be able
to locate and launch it quicker.
1.1.2 Multi user installation
1. Login as Administrator on your Windows system. If
your regular Windows user account has administrator
rights (a dangerous thing but common on many Windows
default installations), you can skip this step.
2. Download the file grassqgis-libs.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis-libs.zip].
3. Unzip the file to some directory on your harddisk.
If you have a Windows version that does not have
support for zip archives built-in and need a software
to unzip the files, I recommend you download and
install IZArc from [www.izarc.org].
4. After unziping, you will find a folder system-libs
in whatever location you decided to unzip to. Open
the system-libs folder and copy the files you find in
there (several .dll files) to a location where
Windows looks for system files. This is usually
C:\WINDOWS\system32.
5. Download the file grassqgis.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassqgis.zip].
6. Unzip the file to some directory on your harddisk.
If you have a Windows version that does not have
support for zip archives built-in and need a software
to unzip the files, I recommend you download IZArc
from [www.izarc.org].
7. After unziping, you will find a folder grassqgis in
whatever location you decided to unzip to. Take the
grassqgis folder and move it to a folder that does
not contain any spaces in its name or in the name of
any folder leading up to it. You can also just move
grassqgis to C: or wherever you like.
8. You now have to adjust access permissions on the
grassqgis folder so that normal users will be able to
open the folder and launch qgis.exe. You also need to
allow normal users to create new folders under
C:\Program Files\grassqgis\msys\home.
9. If you want, you can create a link to qgis.exe in
the Windows Start menu to allow users to locate and
launch it quicker.
That's all you need to do to install QGIS and GRASS on
your system! If you are still logged in as
Administrator, log out now and log in again using your
regular work account.
1.1.3 Installing tutorial data
Next, we will download some data sets that we will use
in this course. Please note that you are only allowed
to use this data as tutorial data for this course and
not to publish it in any way!
1. Download the file import.zip from [http://www.uni-kiel.de/ufg/dateienDucke/import.zip]. This archive
contains data in various GIS and other formats that
we will use in the sections on importing GIS data
into QGIS and GRASS.
2. Download the file grassdata.zip from [http://www.uni-kiel.de/ufg/dateienDucke/grassdata.zip]. This archive
contains several complete GRASS locations with
example data we will use in the sections on data
processing with GRASS GIS.
3. Unzip both files and copy the folders you got to the
folder where you keep your own Documents. Usually,
this will be the "My Documents" folder on your desktop
(but you can copy everything to wherever you usually
put your documents; there is no issue with spaces in
the folder name here). For the rest of this course, I
will assume that the folders import and grassdata
reside in your "My Documents" folder.
Additional notes
There is at least one viable alternative for running
GRASS on Windows which involves using the Cygwin
extensions. However, this needs a lot of additional
software to be installed on your Windows box to
essentially create a complete Unix environment for
GRASS to run in. If you feel that you have the time and
space for this, check out the instructions on this
page: [http://geni.ath.cx/grass.html]. We will not be using that version in our
course. It handles almost exactly like a Linux
installation of GRASS so you can consult any GRASS
tutorial on the web for details on how to use it.
QGIS is a localized software. this means, that all
program menus, messages etc. will pop up in the
language of your windows installation. This tutorial is
written for the English version of QGIS. If this gives
you trouble, locate the language file for your system
in C:\Program Files\grassqgis\share\qgis\i18n and
delete or move it to another place on your file system.
E.g., if you are on an Italian system, get rid of
qgis_it.qm, restart QGIS and everything will be in English!
2 Basic GIS concepts
For your convenience, however I will review some basic
GIS concepts and terminology at this point.
An excellent on-line tutorial for archaeological GIS
usage is available on[http://ads.ahds.ac.uk/project/goodguides/gis/index.html]. Browse there index for more
reading material: [http://ads.ahds.ac.uk/project/goodguides/g2gp.html]. There are many excellent
geographical GIS texts and instructions on the web. I
personally like this one: [http://erg.usgs.gov/isb/pubs/gis_poster/]. Or browse this index for
more [http://www.coloradocollege.edu/ats/labs/GIS/learning_teaching.htm].
2.1 Layers
A QGIS-Project (like any GIS project) consists of a
number of layers. These layers will be shown in the box
at the left side of the QGIS window, which is just
blank right now. They are stacked on top of each other
in order of display. Each layer has the following properties:
* it has a data type
* it refers to a GIS data file in a specific GIS data
format on your harddisk
* it has a symbology (display style) that determines
its color scheme, transparency etc
* it contains data, whose values can be queried
2.2 Data types (raster and vector)
GIS systems can integrate data from many different
sources: scanned maps, field measurements, laser
scanned surfaces, airphotos, database records etc. As
long as something has coordinates (or you can
reconstruct them), it can go into the GIS! Different
forms (data types) are suitable for different types of
information. Basically, in a GIS we have raster data
for imagery and other rasterized information and vector data
in the form of points, lines and polygons.
2.2.1 Raster data
Imagine putting an orthogonal grid of regularly spaced,
rectangular cells over the real world. In the center of
each cell, you take a measurement of the variable that
you are interested in, e.g. the height of the terrain.
The value of this measurement gets stored in the cell
and the whole cell is painted in some color that
represents this value (e.g. "blue" for everything on or
below 0 meters to represent water and "white" for
everything above 2000 meters for mountain tops, with
brown, green and yellow for the ranges in between). You
have created a raster model of elevation.
Raster data also means image data (pictures). A digital
camera captures a rasterized model of the real, visible
world by storing a colour value in each raster cell
(also called a pixel in digital imaging software). In
the case of image data, the cell colour directly
reflects its measured value, i.e. the visible light's
color colour, hue and saturation at his point. Image
data conveys a lot of information for the human eye,
but is meaningless for a computer, unless it is
analyzed for structures and classified. Classification
means to do things such as determining the type of
vegetation on a satellite image and coding the cell
with a class value that represents this vegetation
(such as "1" for eucalyptus trees, "15" for grassland
etc.). A class (sometimes also called a category or bin
) is an integer number, that is a whole number with no
decimal part (-1, 0, 2, 20013 etc.). A class is a value
taken from a limited range of possible values (e.g. 100
different tree species to be identified on the
satellite image). Classes can be counted (in other
words, they are discrete values) and simple statistics
can be made from the counted frequencies. That's all.
Classes can have labels to give them a more intuitive
meaning (e.g. 1="dense", 2="medium", 3="light").
On the other hand, raw measurement data, such as
elevation, can (in theory) be measured and stored with
an arbitrary degree of precision (e.g. 10.1232 meters)
and have an infinite number of different values
(so-called continuous values), depending on where you
measure and how exactly, such values are called
floating point numbers. Classified (integer) raster
maps can be produced from floating point maps by
defining ranges that map them to discrete classes (e.g.
1.0 m to 100.0 m=1="low"; 100.01 m to 1000.0 m=2="medium";
1000.01 m to 2000.0 m=3="high").
Raster data typically completely covers a rectangular
area of a certain extent/coverage (although there might
be patches with no information, represented by
so-called NULL or no data cells, e.g. at points where
it is impossible to take a measurement). Think of
airphotos or satellite imagery. The level of detail in
raster data is directly connected to its ground resolution
, i.e. the number of cells in the grid. Obviously, the
smaller the cells, the more detailed the raster model
of the real world phenomenon will be. Raster resolution
is limited by space constraints. If you double the
resolution (i.e. half the size of the cells), you end
up with four times the space requirements, as the data
volume grows in two dimensions and 2^{2}=2\times2=4. Any processing and
analysis of the data will also take four times as long.
In a raster data model, the information we are
interested in, is usually stored directly in the cells.
Thus, for digital terrain modelling, we would have one
raster layer for heights, one for slope etc. This is
entirely different with vector data, where attribute
tables are used to save all sorts of information
relating ot the objects in just one vector layer (see
next section).
Frequent examples of raster data layers in a GIS are:
* continuous raster data (also called fields in the GIS
literature):
- digital elevation models with one height
measurement for e.g. each square meter. And maps
derived from them such as slope and aspect maps
- density and concentration measurements interpolated
from point samples (e.g. phosphate concentration).
- geophysical sensor data
- airphotos and satellite images with real or false colors
* discrete raster data:
- grayscale imagery (with a limited number of gray
shades, usually 256).
- black and white scans.
- scanned topographic paper maps with a limited
number of colors representing objects, such as
fields, woods, roads, railways etc.
- scanned classified geologic and land use maps
2.2.2 Vector data
Vector maps show discrete geometric objects defined by
arbitratrily precise spatial coordinates. These objects
are sometime also called features, but I will use the
term feature exclusively (with a few exceptions where
noted) to describe objects as part of the stratigraphy
of archaeological an site.
The shape of all objects in a vector map is simply
described by a set of coordinates. However, there is a
difference in how these coordinates (also called nodes
or vertices) are connected. If there is no connection
between any two coordinates, we have a point. If at
least two pairs of coordinates are connected (by a
straight line called an arc or edge -- we will not
consider curves) we have a line. If we have a line with
more than two vertices (a polyline) where the last pair
of coordinates is connected to the first than we have a
closed area, also called a polygon. Points only have
positions, lines have lengths and polygons also have an
area. Obviously, you can decompose a polygon into a
polyline by breaking one connection and a line into
points by breaking all connections. For this reason,
when you start digitizing a new map in a GIS you need
to tell the system wether you are going to produce
points, line or polygons. Different types are usually
kept in separate layers in a GIS to make processing easier.
Obviously, the different types have different
topological properties that give you different options
for analysis. Point data can, e.g. be queried to check
if points lie within specific polygons in another map.
The level of detail in vector data, and thus the kind
of questions you can answer with it, is hard to guess
from its visual appearance alone, as you can zoom in
and out of a vector map without visible loss of
quality! You need to have some background information,
such as the scale of the paper map from which the data
was digitized.
Objects on vector maps typically have attribute data
attached to them. Attribute data comprises all sorts of
information about an object. E.g. for the polygon
representing an archaeological feature, this could be
the name of the excavator, the date of discovery, an
inventory number etc. All attribute values for the
objects in a single GIS layer/data file are stored in
one attribute table with one record (row of data) for
each object. Attribute tables can be stored in files
(e.g. shapefiles store them in dBase format by default)
or in a separate, full-blown database. In any case,
there needs to be some way to link each geographic
object to the specific record in the database table
that contains its attribute data and this is called the
primary key in database lingo. This primary key must be
a simple integer type value which has to be unique for
every object. In some cases this is easy to define.
E.g. for an archaeological feature we could use its
inventory number as primary key.
Examples for the different types of vector data:
* point data
- GPS coordinates
- artefact positions
- total station measurements
- soil samples
* line data
- contour lines
- roads
- railways
* polygon data (areas)
- digitized feature outlines
- trench outlines
- all sorts of areas
Topology
A good GIS also handles vector topology. Topology
describes spatial relations between geometric objects.
A typical example is a sketch of two features with a
common border. In a non-topological system (such as CAD
or a very bad GIS), this border would be saved twice
and the features overlap. In a real GIS, the
information that it is a shared border will be recorded
explicitely and the line will exist only once. Correct
topology is very important for data analysis. There is
much more to topology in a GIS and you will not have a
lot of trouble to google up a wealth of information.
See, e.g., this whitepaper: [http://www.esri.com/library/whitepapers/pdfs/gis_topology.pdf] or this paper, which is
easier to understand [http://www.coloradocollege.edu/dept/SW/GIS_Lab/tips&tricks/Understanding_Topology.doc]. [
Need to talk about topology problems, like overlap,
overshoot etc. and the potential of topology for
spatial checks and queries]
Interpolation to raster
Very often, you will want to create complete raster
models from a cloud of vector points, e.g. a digital
elevation model from a set of points measured in with a
total station. For this you need to reconstruct the
values at each grid cell that does not contain a
measured value from the closest measured values in its
neighbourhood. This process is called interpolation and
is a very frequent operation in a GIS. We will look at
it in more detail later.
Additional notes
Practice will teach you which data type is best for
representing specific information. Most often, the data
provider will make the choice for you. There are cases
where several choices are possible and the best one
depends on what you want to do with the data. E.g. you
can scan a topographic map and import it into a GIS
project just like that or you can digitize the outlines
of all visible fields, roads, etc. on that map to use
the vectorized data.
Basically, if you want to calculate a value for a every
location of your map and you are comfortable with
dividing your geographic regions into a regular grid of
cells for this (e.g. the density of artefacts for every
square meter of a trench), you are producing raster
data and need all input in rasterized form.
If you want to calculate a value for every object
visible on your map (especially geometric values, e.g.
the area of each feature on a site map) or are
interested in attribute data linked to objects (such as
a database record which describes the contents of
features on a site map), you want to use vector data.
You can convert between raster and vector formats in
both directions, and also between different types of
vector geometries (point, line, polygon). It is a
common thing to e.g. scan in a paper map and then trace
the contour lines on-screen to create a map of vector
lines, each one with an attribute that stores the
height information for that line. These lines will
consists of nodes (the points where you clicked the
mouse on the screen) and arcs that connect them. If you
want to turn the height information in the contour
lines into a smooth continuous raster model for e.g.
pretty visualizations or visibility calculations, you
can extract the nodes into a layer of points with
height attributes and run an interpolation to create
the raster model (we will see how to do this basic
operation later).
Unfortunately, some GIS talk about grids (or gridded data
) when they really mean raster data, although a grid
might just as well mean a vector map of rectangles
(e.g. such as is usually overlayes on printed
topographic maps). Gridded data is real point vector
data where each point is positioned on an imaginary
regular grid.
Both raster and vector data representations can be
extended to three dimensions. For vector data, this is
straightforward, as all that is necessary is to record
3D-coordinates (X,Y and Z). Raster cells can also be
extended to 3D. Threedimensional cells are called voxels
(volume pixel). Think of Lego bricks and you get the idea.
Current GIS technology is predominantly 2D with the
possibility to visualize 3D data using separate
viewers. The software we are using for this course does
not currently have any 3D visualization capacity, but a
full installation of GRASS (on Linux, Mac OS etc.) has
a software called nviz for this purpose. As a Windows
replacement, we will use ParaView which also works
great (see [sec:3D-visualisation-with]).
For 3D raster (voxel) space is an even harder
constraint to resolution than in the 2D raster case, as
the data volume grows in three dimension. Doubling the
resolution gives you 2^{3}=2\times2\times2=8times as much data!
2.3 Coordinate systems and projections
Every object in a GIS has a place on the real Earth. If
you want to be able and find it, you need to make use
of one of the many geodetic reference systems in use
across the entire globe. A geodetic reference system
consists of a coordinate system which defines how
coordinate values are to be interpreted and projection
information which defines all sorts pecularities that
are necessary to map geographic data nicely in a
specific part of the world. In addition there are many
details that can driven even the most experienced GIS
expert crazy and lend room to a plethora of error
sources. The following is a very concise text that
contains just the bare essentials you need to know to
correctly handle your GIS layers. The internet offers
lot of detailed information. A nice page can be found
here: [http://gis.washington.edu/esrm250/cfr250/lessons/projection/index.html].
Never try and overlay geographic data from sources with
different geodetic reference systems without a basic
understanding of the following concepts. You will get
wrong results that are not always easily visible! Old
maps (even just a decade old) may use other reference
systems than modern data!
2.3.1 Coordinate systems, projections and datums
Let's first get the basic terminology straight. There
is a lot of confusion about this and even GIS sofware
messes up a lot of it. Therefore, I will try to be as
brief as possible.
Geographic coordinate systems
Geographic coordinates are really the only precise way
of locating anything on the Earth's curved surface.
Geographic coordinates are angular readings (longitude
and lattitude) that allow for the precise definition of
any point on the Earth's surface.
Geographic coordinates can either be given in degrees,
minutes and seconds or as decimal degrees. It is easy
to convert between the two and instructions can be
found, e.g. on this page: [http://www.warnercnr.colostate.edu/class_info/nr502/lg1/notes/dms_and_dd.html]. Just make sure you know
which type your data uses.
Unfortunately, geographic coordinates are not very
handy for anything to do with 2D mapping and data
analysis. Even calculating the distance between two
points along the Earth's surface is a pretty complex
operation. Thus, we really want coordinates to be meter
readings in a planar (cartesian) coordinate system.
Cartesian coordinate systems
Cartesian coordinate systems are exactly what you know
from your math class in school: a planar coordinate
system with two orthogonal axes: X (or easting) for the
horizontal position and Y (or northing) for the
vertical position (this can also be extended into 3
dimensions by adding a Z axis). The point where the
axes meet is called the origin (with coordinates 0/0 or
0/0/0 in the 3D case). X coordinates grow towards East
and Y coordinates grow towards North (Z coordinates
grow towards the zenith). In geography (and therefore
GIS) this has always been so and will always stay that
way. X coordinates are always written down first, then
Y, then Z. Never ever record coordinates in any other
way, especially when setting up a local survey (see
below). Unfortunately, some archaeologists (and survey
instruments) are not aware of this and measure things
in a Y/X system. Be sure to swap such coordinates for
use in a GIS!
Projections
Earth is not flat. Therefore when we measure anything
on the Earth's surface and want to map (project) it
onto a flat (planar) paper map or computer screen, we
will distort the original shapes. There is no way
around it. Obviously this effect will be more
noticeable when things are measured for a big area,
such as a whole country. Thus a compromise will have to
be made to keep distortions low at least in those parts
that are of immediate interest. The mathematical set of
rules to achieve this is called projection information
and you need to know it if you want to work with data
in national grid system (see below). There are
different types of projections which are optimal for
different mapping and analysis purposes but the most
frequently used on is the transverse mercator projection.
Datums
I will just say this much: a datum is information you
need if you want to use geographic data that has a
different projection than your working region (see
definition of reprojection below). There are datums
with different numbers of parameters. Try to get the
one with the highest number of parameters and use that,
it will be the most accurate.
[
>From 3 to 7?]
Georeferencing
Often, you need to import data into your GIS project
that does not have any coordinates. The process of
assigning coordinates to such "floating" objects is
called georeferencing. There are different ways to do
this, depending on the sort of input data. Examples:
* A picture of an archaeological site was taken from an
airplane. In the picture, you can identify a number
of fixed points for which you know the real world
coordinates. By mapping pixel coordinates in the
digital photo to the real world coordinate, you
georeference the image. Also, if a sufficient number
of points is given, distortions in the image due to
skwed angles, lens distortions etc. can be
compensated for and the image will be rectified.
* You have surveyed a number of points using a total
station on an excavation working in a local system.
Now, you have found some fixed points in the national
grid and surveyed those, as well. By mapping the
local coordinates to the fixed point national grid
coordinates, you can transform all your previous
total station measurements to national grid
coordinates (in the case of vector data, this process
is also called rubbersheeting).
* In the process of an excavation, artefacts were
collected in square meter units (quadrants) within a
trench (don't do this in reality...) to get an
approximate mapping of artefact densities, you decide
to count the artefacts from each square meter and
represent this value as a point measurement by
assigning it the coordinates of the respective quadrant.
Reprojection
Sometimes, you will need to merge data with a different
projection (e.g. from UTM data from a GPS) into you
working location. This process is called reprojection
or projection transformation (if you convert between
different types of projection). Most GIS have a
database with projection information for most of the
world's known countries, so you will have no trouble
doing this. In some cases, you will need to get
additional information (such as a datum with more
parameters) for optimal accuracy.
2.3.2 National grid systems
National mapping agencies need to map their country's
topography onto planar maps in order to be able to
easily calculate distances etc. As was mentioned
before, projecting points on the Earth's surface to a
planar system causes distortions.
Exactly because of this, every country uses an
individual projection that minimizes distortion within
its own borders. In addition, countries that stretch a
long way from West to East will not be able to minimize
distortion effectively over the whole length.
Therefore, countries are divided into vertical stripes
with different projection parameters for each of them.
The origin of the cartesian coordinate system also
varies for every national grid systems, there are
individual datums for precise projection
transformations and sometimes other intricacies. Be
sure to know them all before you work with national
grid data, especially if you have to merge it with data
from other systems, such as UTM.
An excellent source of information about the world's
national grid systems is [www.asprs.org/resources/grids].
2.3.3 The UTM system
The UTM systems was really designed as a globe-spanning
reference system for satellite navigation. Every GPS
records data in this system by default. UTM works just
like a national grid system (i.e. coordinates are
planar and in meters), but it crosses all national
boundaries. In order to keep distortions at a passable
level, 32 vertical stripes have been defined.
World-wide data (such as the SRTM data: [http://srtm.usgs.gov/]) is often in
UTM format. That's OK because every good GIS can
convert UTM to a national grid system or to geographic
coordinates (and vice versa).
2.3.4 Local surveying systems
Lots of people (especially archaeologists) have to work
in places where information about the national grid
system and fixed point positions can be hard or
impossible to get. Even European countries sometimes
treat this information as a military secret. The
Republic of Poland, e.g., has only started to release
information about its national grid system to the
public in the year 2000! In such cases, surveying will
have to fall back to a local system: an origin is set
somewhere convenient and points are measured in
relation to that origin.
This is why you will get data in such systems very
often. Archaeological work usually happens on a scale
that does not make projections necessary, as the
warping effect of the Earth's curvature will be
negligible. For this sort of data, many GIS know
non-world (MapInfo), x-y (GRASS) or otherwise
unprojected coordinate systems. Some GIS, however don't
and working with such data will be a pain (e.g. ArcGIS
where you must use the local projection as a sort of
fake projection...); luckily GRASS can handle this sort of
data just fine and we are OK!
Needless to say, the only chance of locating points
surveyed in a local system is to find national fixed
points later and use them to properly georeference everything.
2.4 GIS data formats
In part, this is to do with the fact that different
formats have different capabilities and flexibility for
storing geographic information. For the biggest part
however, it is the result of commercial GIS makers
trying to bind customers to their product. Today, the
situation has improved insofar as an international
standards organization (the OGC, URL) exists to ensure
the development and continued usage of standard GIS
file formats.
While QGIS directly uses these standard formats, GRASS
has its own formats for both raster and vector data.
All data inside the GRASS database is stored in these
native GRASS formats. This means that you must know how
to import and export data to and from GRASS and we will
look into this in section [??].
The basic formats, that almost any GIS system can
handle (i.e. import and export) are:
2.4.1 ESRI Shapefiles
This has been a standard format for GIS vector data for
many years. Almost any GIS can handle it. Shapefiles
can store 2D or 3D vector objects. One shapefile can
only have one type of data (points, lines or polygon).
On your harddisk, a shapefile actually consists of more
than one file. For each map, there are at least three
files with the suffixes .shp (the actual coordinates of
the geographic objects), .dbf (attribute data in a
simple dBase format file) and .shx (file of indices to
the shapes). Where applicable, additional files will be
generated by the GIS, such .prj (to store projection
information), .idx (a spatial index for faster display
of objects) etc. Shapefiles do not store full topology
information, which is why the inventor of the format
(ESRI Corp.) calls them "simple feature" files.
2.4.2 GeoTIFF
Raster data looks just like pictures on screen and in
many cases that's all people want: just a picture of
the map to visualize in their GIS. For this reason, the
trusty old TIFF picture file format was extended to
also include some geographic positioning information
(georeferencing) in an additional file with the
extension .tfw, the so-called world file, a simple
ASCII file (see below) that (actually, there are other
ways of storing GIS related information in a GeoTIFF
file, but we will not make things to complicated here).
If you load such a TIFF file into a GIS, it will
automatically recognize the referencing information and
display the raster image at the correct geographic location.
Unfortunately, GeoTIFF, being essentially a picture
file format, is not a very good choice for saving
anything else than completely processed (image) data
that does not need to be analyzed any further. It is
like taking a 'snapshot' of your GIS raster map as
displayed on screen. There are, however no other
standard formats for GIS raster data and in some cases
your only option may be to export every cell value as a
point in tabular ASCII format (see below), import and
then rasterize the data again.
2.4.3 Tabular ASCII
The acronym ASCII stands for American Standard Code for
Information Interchange. ASCII files are very simple
text files that can be read and written using the most
primitive text editors or command line tools. Such
files are often the output of measuring instrument
software (e.g. GPS, total stations), databases,
spreadsheets or manual typing. Tabular ASCII files are
files that store simple point coordinates and (most
often) additional information, such as measurements
and/or labels using one record per line, like this:
45.0;12.4;70.3;"Point one"
In this case, we have a 2D point coordinate consisting
of an easting, a northing, a height reading in meters
and a point label, i.e. four fields of data. The data
fields are separated by a so-called delimiter, in this
case a semicolon but other delimiters such as tab stops
or simple spaces are also common. The text label is
enclosed with quotation marks to signify that this is
text, not a numeric value. Such point data can be
imported by any GIS. If not directly, you can import
the ASCII file into a spreadsheet such as Microsoft
Excel and export from there as a dBase file which can
be read in by the GIS. In addition, most GIS systems
offer their own special ASCII import and export formats
for more complex data such as polygons. Such ASCII
files will also be readable with a simple text editor,
but only a GIS that can understand this special format
will be able to import it successfully. Data in this
format is sometimes also refered to as comma separated value
(CSV) files, although a comma is just one choice (and
not even a good one, as some languages use it also for
a decimal point) for delimiting data fields!
Be aware that the importing software cannot make any
sense of tabular ASCII data by itself! You have to tell
it explicitely what each field means which separator to
use etc. Every software the can import or export
tabular ASCII files will have some way of letting you
specifiy these options. Be sure that you keep the
necessary information somewhere or you might get into
trouble some time. A good idea is to write a comment
about what each field means, where the data comes from
etc. into the first line of the ASCII file. The usual
practice is to prefix the comment line with a hashmark
(#). Before you import the data, you can delete the
comment line (after you made a copy of the original
file) or the software may be smart enough to skip it.
2.4.4 DXF
The Data eXchange Format that CAD software, such as
AutoCAD, uses to import and export CAD drawings. Unlike
shapefiles, DXF files can store different types of
geometries in one file, i.e. the file itselft is
multi-layered. CAD and GIS are two very different
worlds, but DXF ist still the best bridge between them.
DXF files (like CAD software in general) have no
concept of geographic topology, as is true for many CAD
users. Thus, you will often find that things which
should be polygons (like the outlines of features) are
polylines with lots of overlap and other topological
problems that have to be cleaned in the GIS.
Most often, DXF files are actually ASCII files
(although it is possible to also save them in
human-unreadable binary format). You can open such a
file with a simple text editor and you will see that
you can indeed "see" the data. However, you will not be
able to make very much sense of it unless you know all
about the format of a DXF file, that is how to
interpret the ASCII text. You can actually learn that
(at least in theory), because the creators of the DXF
format have documented their format (see [http://usa.autodesk.com/adsk/servlet/item?siteID=123112&id=5129239]).
DXF support is currently not very good in either QGIS
or GRASS. Both systems can only import very simple DXF
files reliably.
Additional notes
The opposite of human-readable ASCII files are called
binary files. They make sense to the software that
created them, but certainly not to you (and not to
other software that does not understand the particular
binary format). Try opening a Word document (.doc) in a
simple text editor and you will see what I mean.
It is not a good idea to create or edit an ASCII file
with a full-blown word processor such as Microsoft Word
as such software will save all sort of formatting
information into the file and make in unreadable by
other software. On Windows, you can use the program
notepad to create abd edit ASCII text files, but if you
work a lot with ASCII data, consider downloading a
good, free editor for you system, such as PSPad [http://www.pspad.com/]to make
your life a lot easier!
ASCII files are completely the same on Mac OS, Unix
systems and Windows, except for one thing: the
(invisible) special character that signals each line's
end differs between systems! Good text editors can
handle this, bad ones will give strange effects such as
displaying the whole file on one line!
ASCII is great for long-time archiving of data as you
will always be able to read the data as long as you
know how to interpret it, even if the maker of the
software you used for creating it has gone bankrupt a
long time ago.
ASCII files are usually very big as the
human-readability demands all information to be written
in the file explicitly. In some case, e.g. if you want
to email an ASCII file to someone, you can compress it
to a fraction of its size by using a file compression
program (e.g. [www.izarc.org]). However, this means that the recipient
also needs software to uncompress the file again. In
addition, compressing files is not a good idea for
long-term archival, as a single defect bit, e.g. caused
by aging of the storage media, can make the whole
compressed file unusable.
2.5 Topology
[
[This text taken from
http://www.innovativegis.com/basis/primer/concepts.html:
need to simplify and adapt to archaeology!!!]]
The topologic model is often confusing to initial users
of GIS. Topology is a mathematical approach that allows
us to structure data based on the principles of feature
adjacency and feature connectivity. It is in fact the
mathematical method used to define spatial
relationships. Without a topologic data structure in a
vector based GIS most data manipulation and analysis
functions would not be practical or feasible.
The most common topological data structure is the
arc/node data model. This model contains two basic
entities, the arc and the node. The arc is a series of
points, joined by straight line segments, that start
and end at a node. The node is an intersection point
where two or more arcs meet. Nodes also occur at the
end of a dangling arc, e.g. an arc that does not
connect to another arc such as a dead end street.
Isolated nodes, not connected to arcs represent point
features. A polygon feature is comprised of a closed
chain of arcs.
In GIS software the topological definition is commonly
stored in a proprietary format. However, most software
offerings record the topological definition in three
tables. These tables are analogous to relational
tables. The three tables represent the different types
of features, e.g. point, line, area. A fourth table
containing the coordinates is also utilized. The node
table stores information about the node and the arcs
that are connected to it. The arc table contains
topological information about the arcs. This includes
the start and end node, and the polygon to the left and
right that the arc is an element of. The polygon table
defines the arcs that make up each polygon. While arc,
node, and polygon terminology is used by most GIS
vendors, some also introduce terms such as edges and
faces to define arcs and polygons. This is merely the
use of different words to define topological
definitions. Do not be confused by this.
Since most input data does not exist in a topological
data structure, topology must be built with the GIS
software. Depending on the data set this can be an CPU
intensive and time consuming procedure. This building
process involves the creation of the topological tables
and the definition of the arc, node, and polygon
entities. To properly define the topology there are
specific requirements with respect to graphic elements,
e.g. no duplicate lines, no gaps in arcs that define
polygon features, etc. These requirements are reviewed
in the Data Editing section of the book.
The topological model is utilized because it
effectively models the relationship of spatial
entities. Accordingly, it is well suited for operations
such as contiguity and connectivity analyses.
Contiguity involves the evaluation of feature
adjacency, e.g. features that touch one another, and
proximity, e.g. features that are near one another. The
primary advantage of the topological model is that
spatial analysis can be done without using the
coordinate data. Many operations can be done largely,
if not entirely, by using the topological definition
alone. This is a significant advantage over the CAD or
spaghetti vector data structure that requires the
derivation of spatial relationships from the coordinate
data before analysis can be undertaken.
The major disadvantage of the topological data model is
its static nature. It can be a time consuming process
to properly define the topology depending on the size
and complexity of the data set. For example, 2,000
forest stand polygons will require considerably longer
to build the topology that 2,000 municipal lot
boundaries. This is due to the inherent complexity of
the features, e.g. lots tend to be rectangular while
forest stands are often long and sinuous. This can be a
consideration when evaluating the topological building
capabilities of GIS software. The static nature of the
topological model also implies that every time some
editing has occurred, e.g. forest stand boundaries are
changed to reflect harvesting or burns, the topology
must be rebuilt. The integrity of the topological
structure and the DBMS tables containing the attribute
data can be a concern here. This is often referred to
as referential integrity. While topology is the
mechanism to ensure integrity with spatial data,
referential integrity is the concept of ensuring
integrity for both linked topological data and
attribute data.
2.6 Thematic maps
One of the basic functions of a GIS is to produce
pretty, colored maps. A map that cleverly uses things
like colors, transparency, different symbol sizes and
shapes etc. to help you understand some spatial
phenomenon is called a thematic map. E.g. a common
theme for relief models includes using a typical range
of colors to represent different heights and shading
mountain sides to achieve a sort of pseudo 3D effect.
Frequently, you will see chloropleths maps that use
areas with different colors to e.g. symbolize different
artefact densities in different trenches/quadrants.
3 Using QGIS and the GRASS plugin
4 Basic GRASS GIS usage
4.2 How to use a GRASS command
5 Common applications of GRASS GIS
6 Advanced data processing with GRASS GIS
6.1 More on interpolation
6.2 Network analysis
6.3 Database storage
6.3.1 Attributes in an external DBMS
6.3.2 Geometries in an external DBMS
6.4 Environmental modelling
6.4.1 Erosion models
7 GRASS extensions for archaeology
8 Script-based data analysis with GRASS GIS
9 3D visualisation with ParaView<sec:3D-visualisation-with>
10 Further directions
A Appendix
A.1 GRASS commands index
A.3
More information about the grass-user
mailing list