[GRASS-dev] Test suite for GRASS - proposal, discussion welcome

Wed Jun 8 10:12:03 EDT 2011

Hello Glynn,
i was thinking a lot about your and my approach and decided finally to
try your approach first with the hope it will be sufficient for any
kind of test cases. I still have concerns about the comparison of
floating point data regarding precision, i.e: coordinates, region
settings, FCELL and DCELL maps, vector attributes ... . More below:

2011/6/3 Glynn Clements <glynn at gclements.plus.com>:
>
> Soeren Gebbert wrote:
>
>> I was thinking about a similar approach, but the effort to parse the
>> modules XML interface description to identify the command line
>> arguments to compare the created data was to much effort for me.
>
> I don't see a need to parse the command; just execute it and see what
> files it creates.

Ok, i see.

>> Besides that, the handling of test description, module dependencies
>> and the comparison of multiple/timeseries outputs (r.sim.water)
>> bothered me too. I still have no simple (interface) answers to this
>> issues (maybe these are no issues??).
>
> Dependencies aren't really an issue. You build all of GRASS first,
> then test. Any modules which are used for generating test maps or
> analysing data are assumed to be correct (they will have test cases of
> their own; the most that's required is that such modules are marked as
> "critical" so that any failure will be presumed to invalidate the
> results of all other tests).

I assume such critical modules are coded in the framework, not in the
test scripts? But this also means that the test scripts
must be interpreted and executed line by line by the framework to
identify critical modules used for data generation?

Example for a synthetic r.series test using r.mapcalc for data
generation. r.mapcalc is marked as critical in the framework:

{{{
# r.series synthetic average test with r.mapcalc generated data
# The r.series result is validated using the result.ref file in this
test directory

# Generate the data
r.mapcalc expression="input1 = 1"
r.mapcalc expression="input2 = 2"

# Test the average method of r.series
r.series input=input1,input2 output=result method=average
}}}

Here the assumed workflow:
The framework will read the test script and analyse it line by line.
In case r.mapcalc is marked as critical and the
framework finds the keyword "r.mapcalc" in the script, appearing as
first word outside of a comment, it checks if the r.mapcalc test(s)
already run
correctly and stop the r.series test if they not. In case r.mapcalc
tests are valid it starts the r.mapcalc commands and checks there
return values. If the return values are correct, then the rest of the
script is executed. After reaching the end of this script the
framework looks for any generated data in the current mapset (raster,
raster3d, vector, color, regions, ...) and looks for corresponding
validation files in the test directory. In this case it will find the
raster maps input1, input2 and result in the current mapset and
validation.ref in the test directory. It will use r.out.ascii on
result map choosing a low precision (dp=3??) and compares the output
with result.ref which was hopefully generated using the same
precision.

This example should cover many raster and voxel test cases.

> I don't normally advocate such approaches, but testing is one of those
> areas which (like documentation) is much harder to get people to work
> on than e.g. programming, so minimising the effort involved is
> important.
>
> Minimising the learning curve is probably even more important. If you
> can get people to start writing tests, they're more likely to put in
> the effort to learn the less straightforward aspects as it becomes
> necessary.

Ok, i will try to summarize this approach:

The test framework will be integrated in the source code of grass and
will use the make system to execute tests.
The make system should be used to:
* run single module or library tests
* run all module (raster|vector|general|db ...) tests
* run all library tests
* run all tests (library than modules)
* in case of an all-modules-test it should run critical module tests
automatically first

Two test locations (LL and UTM?) should be generated and added to the
grass sources. The test locations provide all kind of needed test data
-> raster maps of different type (elevation maps, images, maps of
CELL, FCELL and DCELL type, ...), vector maps (point, line, area,
mixed with and with out attribute data), voxel data, regions, raster
maps with different color tables, reclassified maps and so on ... .
The test data is only located in PERMANENT mapset. But the locations
should be small enough to fit in svn without performance issues.

Each module and library has its own test directory. The test
directories contain the test cases, reference text files and data for
import (for *.in.* modules). Validation of data is based on the
reference text files located in the test directories for each
module/library. Files implementing test cases must end with ".sh",
reference files must end with ".ref" . The test cases are based on
simple shell style text files, so they can be easily implemented and
executed on command line by non developers. Comments in the test case
files are used as documentation of the test in the test summary.

The framework itself should be implemented in Python. It should
provide the following functionality:
* Parsing and interpretation of test case files
* Logging of all executed test cases
* Simple but excellent presentation of test results in different
formats (text, html, xml?)
* Setting up the test location environment and create/remove temporary
mapsets for each test case run

* Comparison methods for all testable grass datatypes (raster, color,
raster3d, vector, db tables, region, ...) with text files
** test of equal data
** test of almost equal data (precision issue of floating point data
on different systems)
*** ! using *.out.ascii modules with precision flag should work?
** Equal and almost equal key value tests (g.region -g, r.univar, ...)
of text files <-- i am not sure how to realize this

* Execution of single test cases
** Reading and analyzing the test case
** Identification of critical modules
** Run of single modules logging stdout, stderr and return value
** Analysis of return values -> indicator if the module/test failed
*** ! this assumes that commands in the test cases make no use of pipes
** Recognition of all generated data by modules
*** Searching grass database for new raster, vector, raster3d,
regions, ... in the temporary mapset
*** Searching for new generated text or binary files in the test directory
** Recognition of validation data in the test directory
** Comparison of found data with available reference data
** Logging of the validation process
** Removing the temporary mapset and generated data in the test directory
* maybe much more ...

Here some test case which must be covered:

A simple g.region test with validation. A region.ref text file is
present in the test directory. It is a file with key-value pairs used
to validate the output of g.region -g.

g.region_test.sh
{{{
# This is the introduction text for the g.region test

# this is the description of the first module test run
g.region -g > region
}}}

The framework will recognize the new text file "region" and the
reference file "region.ref" in key value format in the test dir  and
should use an almost equal key value test for validation. The same
approach should work for r.univar and similar modules with shell
output.

Now a simple v.random test.
Because the data is generated randomly the coordinates can not be
compared. We need to compare the meta information. A file named
result.ref is present in key value format.

v.random_test.sh
{{{
# This is a simple test of v.random
# validation is based on meta information

v.random output=random_points n=100
v.info -t random_points > result
}}}

As with g.region tests the framework should recognize the text file
key value validation.

A simple v.buffer test. The vector point map "points" is located in
the PERMANENT mapset of the test location. A file named result.ref is
located for validation in the test directory. The file was generated
with v.out.ascii dp=3 format=standard.

v.buffer_test.sh
{{{
# Test the generation of a buffer around points

# Buffer of 10m radius
v.buffer input=points output=result distance=10
}}}

In this case the framework recognize a new vector map and the
result.ref text file. It uses v.out.ascii with dp=3 to export the the
result vector map in "standard" format and compares it with
result.ref. The format "standard" is the default method to compare
vector data and cannot be changed in the test case scripts.

I think most of the test cases which we need can be covered with this
approach. But the test designer must know that the validation data
must be of specific type and precision.

I hope i was not to redundant in my thoughts and explanations. :)

So what are you thinking, Glynn, Anne, Martin and all interested
developers? If this approach is ok, i will put it into the wiki.

Best regards
Soeren

>
> --
> Glynn Clements <glynn at gclements.plus.com>
>