[GRASS-user] Python scripts for the Pareto Boundary, tested using spearfish60 data [How to merge multiple python scripts in one?]

Sat Nov 7 11:51:23 EST 2009

Dear grass-users and developers,

I was hesitant to send this on-line since I am a lousy scripter and by
no means a programmer. I posted this off-list to some grass-devs. I
apologise to them for this action.

I need some advice with respect to python scripting (within grass) and
if the code is working, naturally, I would like to share the script
(...maybe my first useful script that I can share) on the grass-wiki.

Ultimately, I wish to see in the future a (native if possible) grass
program (written by a real programmer) that will implement the "Pareto
Boundary", useful for the accuracy assessment of low resolution thematic
maps. I have filed a trac ticket: http://trac.osgeo.org/grass/ticket/804

About:

I wrote some code (for me it was/is a lot of code) in order to get
omission and commission errors out of a series of low resolution
dichotomic (classification derived from MODIS imagery) maps based on a
high(-er) resolution reference map (Landsat7) and plot then (using R
with a custom function) the "famous" Pareto Boundary [1] to compare and
judge which classification map is best (and successively which
classification method from the ones used performs better).

Question:

 How is merging/ calling the several scripts in/ from within just one
script done best? What strategy is best? I really have trouble to get
this done. 

What is problematic:

  (a) Well, due to my ignorance and inexperience I have had some hard
time to decide which of the stuff can be hardcoded and which should not.
I have several hardcoded work-arounds since I don't know (yet) how to
deal with them properly.

  (b) The part where vector points are counted takes TOO much. I stole
the idea described in grass-wiki that uses v.what.vect and, if I am not
wrong, the bottleneck is v.distance (as already discussed in the list
some times in the past).

  (c) I am sure there are more problems... but I can't see them :-)

Working code:

I have 8 python scripts + 2 R which I have tested with small samples
from the spearfish60 dataset [*]. I have no idea how I should proceed:

  - make it one single file (it's a bit hard since I still learn python
stuff);

  - keep the separate and join them as modules in one module-script;

  - source variables, (grass) options and flags, etc. from a
"pareto_options_and_flags.py" module in which variables and the rest can
be set and sourced later to the module that will do the job

Attachement in grass-trac [2] contains:

### I took quite some time to comment (almost) everything so the code is
self-explained. Hopefully it really is like that - not sure since I am
no programmer. ###

  - the 8 scripts [in 2] (step - by - step to extract the
omission/commission errors and export them as csv);

  - R code to plot the boundary and the Oe,Ce of classification(s) [also
in 2]
(but I doubt my code is efficient/ smart).

Dr. Nikos Koutsias helped me understand the Pareto Boundary. Whether I
really did or not is, of course, my responsibility but I wouldn't have
done anything without his help.

Best regards, Nikos

-------------------------------------------------------------
[*] Test using spearfish60 data:
-------------------------------------------------------------

# testing with spearfish60 data
grass64 /geo/grassdb/spearfish60/user1/

# set region to
g.region s=4925000 e=593500 n=4927000 w=590000 res=30 -p

# use sqlite db-backend
db.connect driver=sqlite
database=/geo/grassdb/spearfish60/PERMANENT/sqlite.db
db.connect -p

### prepare input files

# high resolution reference raster map will be
g.copy rast=landcover.30m,landcover_ref

# high redolution reference Class of Interest (to be assessed) will be
r.mapcalc "landcover_refcoi = if(landcover.30m == 51 || landcover.30m ==
71 || landcover.30m == 81 || landcover.30m == 92, 2, null())"

# low resolution classification map 1 (to be assessed for accuracy) will
be
g.region res=100 -pa
r.mapcalc "pareto_classification_rangeland_1 = if (vegcover == 2, 2,
null())"

# prepare a 2nd classification map
g.copy
rast=pareto_classification_rangeland_1,pareto_classification_rangeland_2

# edit/ change 2nd map
d.rast.edit pareto_classification_rangeland_2
out=pareto_classification_rangeland_2_edited
g.remove pareto_classification_rangeland_2
g.rename
rast=pareto_classification_rangeland_2_edited,pareto_classification_rangeland_2

### extract omission and commission errors for pareto-optimal maps and
classifications

#### --- optionally SKIP steps and use the "ONE-LINER" below --- ####
#
# step 1
#python pareto_1_vectorise_rasters.py reference_raster=landcover_ref
reference_coi_rasters=landcover_refcoi
classification_rasters=pareto_classification_rangeland_1,pareto_classification_rangeland_2 --o

# step 2
#python pareto_2_create_lowres_vector_grid.py highres=30 lowres=100 --o
&& python

# step 3
#pareto_3_count_pixels_within_gridcells.py --v && python 

# step 4
#pareto_4_calculate_coi_percentages.py --v && python 

# step 5
#pareto_5_populate_thresholds_and_classifications.py lowres=100 --v

# step 6
#python pareto_6_populate_pareto_errors.py --v

# step 7
#python pareto_7_populate_classification_errors.py --v

# step 8
#python pareto_8_export_csv___spearfish60.py --o
#
#### --- optionally SKIP steps and use the "ONE-LINER" below --- ####

# ONE-LINER (copy-paste carefully)
python pareto_1_vectorise_rasters.py reference_raster=landcover_ref
reference_coi_rasters=landcover_refcoi
classification_rasters=pareto_classification_rangeland_1,pareto_classification_rangeland_2 --o --q    &&    python pareto_2_create_lowres_vector_grid.py highres=30 lowres=100 --o --q    &&    python pareto_3_count_pixels_within_gridcells.py --q     &&     python pareto_4_calculate_coi_percentages.py --q    &&    python pareto_5_populate_thresholds_and_classifications.py lowres=100 --q    &&    python pareto_6_populate_pareto_errors.py --q    &&    python pareto_7_populate_classification_errors.py --q    &&    python pareto_8_export_csv___spearfish60.py --q
# to remove pareto related vector files later execute "g.mremove
vect=pareto* -f"

### use epoxrted csv files to plot the Pareto Boundary (using the R
scripts within R)
-------------------------------------------------------------

P.S.

In my case I used the scripts for burned area maps. But this "boundary"
could be useful for all kinds of dichotomic maps and theoretically even
maps with multiple classes. Probably there are better/ smarter ways to
implement all this.

This was the first contact with python :-). All of it is related
directly with my questions about python scripting posted in grass-user
last month.

---
[1] Analysis of the conflict between omission and commission in low
spatial resolution dichotomic thematic products: The Pareto Boundary
Luigi Boschetti, Stephane P. Flasse, Pietro A. Brivio; published in
Remote Sensing of Environment 91 (2004) 280 – 292

My notes from the paper:

  • A set of Pareto-optimal classified images, with the properties
defined in the previous paragraph, could be obtained starting from the
high-resolution reference map (our ‘true’ data).

  • Each optimal classification will have a pair of error rates (Oe,Ce),
that uniquely identify a point in the omission/commission
Oe/Ce={(Oe,Ce)aR Â ROea[0,1], Cea[0,1]} bi-dimensional space (henceforth
noted Oe/Ce).

  • The line joining all these points is the Pareto Boundary related to
the specific high-resolution reference map and to a specific
low-resolution pixel size.

  • As all the points belonging to the Pareto Boundary represent the
error level of efficient classification, by the definition of efficiency
given in the previous paragraph, it follows that the Pareto Boundary
divides the Oe/Ce in two regions.

  • Any possible low-resolution thematic products can be associated only
to the points in the region above or, at the most, on the Pareto
Boundary.

  • The point (0,0), with neither omission nor commission error, will
unfortunately lie in the unreachable region!

  • The terms ‘efficient’ or ‘optimal’ will be used as synonyms and will
always mean efficient (optimal) under Pareto’s definition.

[2] Python scripts and R code:
http://trac.osgeo.org/grass/attachment/ticket/804/pareto_boundary_grass_python_R_scripts.tar.gz

python scritps:
- pareto_1_vectorise_rasters.py
- pareto_2_create_lowres_vector_grid.py
- pareto_3_count_pixels_within_gridcells.py
- pareto_4_calculate_coi_percentages.py
- pareto_5_populate_thresholds_and_classifications.py
- pareto_6_populate_pareto_errors.py
- pareto_7_populate_classification_errors.py
- pareto_8_export_csv__spearfish60.py

The steps implemented in the scripts are (with respect to my burned area
maps):

  1 Prepare low resolution grids
  2 Count burned (or Class of Interest) pixels in each low resolution
grid-cell
  3 Convert cells to points ( gridcell -> centroids -> point)
  4 Calculate percentages of burned (or Class of Interest) pixels for
each low resolution
grid-cell
  5 Add and populate threshold columns (txx) based on burned pixels
percentage
  6 Add classification results (Update from classification maps)
  7 Add and update columns with Oe, Ce (Omission error, Commission
error)
  8 Export Oe,Ce pairs as CSV

R code [The 1st code is supposed to be ready to become a function.
Currently all to be used "manually"!]
- pareto_boundary.R
- plot_pareto_boundary_plus_domination_regions.R