[GRASS-SVN] r51974 - grass-addons/grass7/imagery/i.segment

Mon Jun 4 14:15:02 PDT 2012

Author: momsen
Date: 2012-06-04 14:15:02 -0700 (Mon, 04 Jun 2012)
New Revision: 51974

Modified:
   grass-addons/grass7/imagery/i.segment/outline
   grass-addons/grass7/imagery/i.segment/parse_args.c
Log:
updated pseudocode/outline.  removed/summarized comments.  line 120 - 240 are most interesting, update of segmentation algorithm steps.

Modified: grass-addons/grass7/imagery/i.segment/outline
===================================================================

--- grass-addons/grass7/imagery/i.segment/outline	2012-06-04 18:49:30 UTC (rev 51973)
+++ grass-addons/grass7/imagery/i.segment/outline	2012-06-04 21:15:02 UTC (rev 51974)
@@ -1,222 +1,128 @@
 This is the draft pseudocode for the region growing segmentation algorithm.  More information, references, requirements, etc are at the wiki:
 http://grass.osgeo.org/wiki/GRASS_GSoC_2012_Image_Segmentation
 
-TODO: Need to consider the size of image vs. size of memory.
-/* MM: There is no mechanism in GRASS to get the size of (free) system memory. Such a mechanism would need to be different for each supported platform. The current solution in GRASS is to either do row-by-row processing or let the user decide how much memory to use. For this module I would recommend the latter, let the user decide. */
+For anyone interested in background discussion, Rev 51907 includes original comments and discussion between EM, MM, and ML.  All comments were combined (and new threads started!) after that revision.
 
+TODO: Memory managment, allow user to imput how much RAM can be used.  Compare i.watershed and i.cost for two current options, initial recommendations are to follow i.watershed.
+
 TODO: Are any parts here potentially useful as library functions?  Are any of these tasks already done by existing library functions?
 
+General notes:
+Avoid use of global variables (makes multithreading easier).
 
-I plan to keep many of these lines (or answers to the questions) as the comments in the code.
 
-Are there any preferences/styles for file layout?  Looking at the i.smap file structure, my first thought is to break each of the major sections into its own file.
-/* MM: Breaking down the code into fns and keeping these in separate files is a good idea and common for more complex modules. */
+files:
 
+iseg.h - declare structures and functions
 
-/****************************************************************************
- *
- * MODULE:       i.segment
- * AUTHOR(S):    Eric Momsen <eric.momsen at gmail com>
- * PURPOSE:      Provide short description of module here...
- * COPYRIGHT:    (C) 2012 by Eric Momsen, and the GRASS Development Team
- *
- *               This program is free software under the GNU General Public
- *               License (>=v2). Read the COPYING file that comes with GRASS
- *               for details.
- *
- *****************************************************************************/
+Structures:
 
-(for my reference, order for headers)
-1. Core system headers (stdio.h, ctype.h, ...)
-2. Headers for non-core system components (X11, libraries).
-3. Headers for core systems of the package being compiled (grass/gis.h, grass/glocale.h, ...)
-4. Headers for the specific library/program being compiled (geodesic.h, ...)
+files: input and output data (segmentation files and/or RAM storage)
 
-#include <grass/gis.h>
-#include <grass/imagery.h>
-#include <grass/glocale.h>
+functions: parameters and function pointers
 
-/* MM: grass/config.h is included by grass/gis.h */
 
-
-int main(int argc, char *argv[])
-{
-
-Declare Structures
-
-What GRASS global variables do I need (where to write temporary rasters, current region, others?)
-/* MM: Try to avoid global variables. Temporary files are usually created by using the file name returned by G_tempfile() which is an absolute path. See e.g. r.cost for an example of G_tempfile() and the segment library to temporarily store raster maps. The current region should not be changed (although some i.* modules do so). */
-
 /****************************************************/
 /******** Parse the input parameters ****************/
 /****************************************************/
 
-raster or image group or image subgroup --> map names (and number of maps?)
-/*ML: Here you just need name of group and subgroup, what goes into these is handled by i.group*/
+Using just imagery group right now.  Could add an option to allow just one raster as an input (user friendly).  Currently no plan for subgroup, it is not often used.
 
-seeds:  All pixels or Vector/points (optional, default = all pixels)
+seeds:  All pixels or user defined points (optional, default = all pixels) user defined points will initially be defined with a raster map, could add option to allow vector map later (user friendly).
 
 constraint (vector linear/polygon) (optional)
 
 segmentation algorithm (only region growing accepted to start)
 
+pixel neighbors: for a particular pixel, use 4 neighbors or 8 neigbors? (flag, not set = 4 neighbors)
+
+Later: memory usage  (r.watershed uses -m flag + memory parameter)
+
+
+
 /* Algorithm parameters */
 
 similarity threshold
 
 how quickly the threshold should be reduced
 
-Minimum pixel size of the segments (optional, default 1)
-/* MM: Rather minimum number of cells per segment I guess. Pixel size refers to resolution. */
+Minimum number of pixels per segment (optional, default 1)
 
 Later:
-Weights for each band in the image group
 weights for shape and compactness
-Color space? (RGB, HSI, L*u*v*, L*a*b*)
-/*ML: Again, I don't think that you should think in terms of color space. As Ned mentioned feature space is probably the most appropriate and which variables define that space is up to the user by defining which maps go into the relevant group. IMHO it should also be up to the user to ensure that the different maps in the chosen group (i.e. the different variables defining the feature space) can be used together and are of the same order of magnitude*/
-/*EM: OK, I certainly agree that feature space is the most important concept.  But I think this could be considered for the last few weeks, If there are some common conversions that are used, it could help smooth the workflow.  (As with stats, maybe this is a seperate module that is called by setting a flag from this one... */
-/*ML: I'm not sure I really understand what this is about: distance in feature space will be an important element of a region-growing algorithm, so nothing that can be added later. Please elaborate on what you mean to do with color space, here ?*/
-/*EM: I'm basing these comments on "Color image segmentation: advances and prospects" by H.D. Cheng*, X.H. Jiang, Y. Sun, Jingli Wang.  They make some strong statements that using standard color spaces (which I think are similar to landsat bands) as unable to deal with shadows - the magnitude of the color values change with the intensity, so a shadow looks like a different color because the color is muted, not because it is a different color.  Apparently using the non-linear conversion (to get L*u*v*) breaks the correlation of each band to the intensity, and helps segmentation algorithms put together contiguous objects that are partially in shadow.  ! But then they go onto say that non-linear color spaces have their own problems.  So the part that would be added later is to do the preprocessing to convert the color space inside the module.  From your other comments - I expect this should be lower priority and/or included in a seperate module, since it is a different task.*/
-/* MM: If the input variables can be anything, e.g. NDVI, principal components, elevation, etc., then non-linear conversion does IMHO not make sense. Also, color space can be more than RGB (3 bands), e.g. Landsat bands 1, 2, 3, 4, 5, 7 (6 bands). I would regard non-linear conversion as optional pre-processing done in a different module. */
 
-/* output parameters */
-need name for new raster and vector map
-default to use: inputname_segmented and inputname_segment_stats  ??
-Allow user to input alternate names?
-/*ML: General practice in GRASS is not to suggest output names*/
+No plans for implementation:
+layer weights, color spaces, color transformations: these should be done by the user prior to running i.segment.
 
---overwrite -o (optional, to overwrite the named output maps)
-/*ML: no need for this, the parser includes this automagically*/
 
-Option to just validate input and exit with messages?
-/*ML: Again, this is not standard GRASS practice, so I wouldn't worry about it*/
-/* MM: Maybe for user-defined seeds: at least one seed point must be within the current region */
-/*ML: Just to be clear: I didn't mean that no input validation should happen (it definitely should), but it is not standard practice in GRASS to allow a dry test run of a module, just to see results of validation*/
+/* output parameters */
+need name for new raster map
 
--e  Maybe an option for basic vs. extended output statistics 
-/*ML: I think we need to discuss whether the calculation of statistics should be part of the actual segmentation module, or whether putting that task into a separate module might be a better idea, i.e. a module which would take as input a (vector?) map of the segments and then calculate statistics. This would also allow to use segments created elsewhere, but still access the statistics for further analysis. I'm currently more inclined towards the separate module option.*/
-/*EM: From a user perspective, my impression is that statistics will be wanted >50% of the time.  Is this correct?  If so, I would want the front end to allow calculating statistics without having to run another module.  If it is OK to have a flag in this module to call the statistics module from the front end, I don't have a preference yet for splitting statistics into a seperate module.  As for input to the statistics module - it seemed the output of the segmentation is a raster map, the input to statistics is this raster map, with the output from statistics is the vector map with attribute table.*/
-/*ML: IMHO at this stage, KISS should be your principle, i.e. write several modules that do one task and do it right. User community is quite diverse in GRASS and ranges from CLI-aficionados doing a lot of scripting to people looking for the mega-GUI that also make coffee. It is quite normal GRASS practice to break workflow into several separate modules. This has a lot of advantages (if a module bails out, you don't lose the whole workflow; you can launch several parts of the workflow in parallel; etc). Each module has its automatic GUI. If time allows at the end, you could look into a customized GUI that combines these modules. Note, however, that there now is a model builder in GRASS, that allows to combine several modules graphically.
-Concering input/output, I would once again simplify the modules: segmentation output = raster; statistics module input and output = vector. The user can always run r.to.vect between the two.
-As you will have gathered by now, my personal idea of good practice is to keep steps as simple as possible and use external glue to stick them together if necessary. In my experience this decreases maintenance work and increases flexibility. It does, I agree, lower user-friendliness when no one provides some pre-fabbed external glueing.*/
-/* MM: I fully agree with ML */
+Later:
+maybe add some glue to allow the user to automatically launch r.to.vect and then the stats module.
 
-Should user input allowed memory usage, or is there a way to find out what is reasonable?
-/*ML: Sounds like a reasonable idea. Look at r.watershed for inspiration (-m flag + memory parameter)*/
 
 /************************************/
 /******** Input Validation **********/
 /************************************/
 
-For each validation step: if fail, output warning and set fail flag.
-Would it be polite to output successful validation tests too?
-/*Again, not standard practice, so don't worry*/
+Exit at first failure:
 
-confirm can read input raster(s) (and vector(s))
-/* MM: this is automatically done by GRASS libraries */
-
-confirm selected algorithm is implemented
-/*ML: As long as you only list available algorithms as options for the algorithm selection parameter, this should be taken care of automagically by the parser*/
-
 confirm input algorithm parameters are in correct range
 
-If output maps exist
-	if -o
-		msg, existing maps will be overwriten
-	else
-		fail
-/*ML: Again, --o is handled by parser, nothing for you to worry about*/
+Todo: confirm if Rast_get_row() and nulls are handled in a way later processing steps can deal with.  (nulls should be skipped)
 
+Grass library functions already handle basic checks (overwrite, conversion from source data resolution to region resolution, etc)
 
-Will run calculations at region settings (I assume this is the proper way to do things, though I have seen a few modules that don't.)  So need to confirm some things for the region:
-/*ML: Yes, you should definitely use region settings !*/
+End of validation checks
 
-Boundary:
 
-Do I need to be concerned that the region boundary could be in the middle of a pixel? Just include any pixels that are >50% in region lines?
-/*ML: No, either pixel size or region boundary are adapted to ensure that region extents are always a full multiple of pixel size.*/
-/* MM: This is all done automatically by the GRASS libs. The module uses whatever Rast_get_row() returns */
+/************************************/
+/******** Open maps **********/
+/************************************/
 
+Open the input and output maps.
 
-Resolution
-Again, this could be required as preprocessing and/or second priority.
-/*ML: If your speaking of pixel size, then this is part of the region settings. If you mean scale of segmentation, then this should be part of the algorithm parameters*/
-/*EM: This check was for pixel size.*/
+Todo: Currently using the segmentation library.  need to implement the -m option to allow possibility to store in RAM.
 
-But if the resolution of the input raster and current region settings don't match - are there straightforward modules that could be applied?  If yes, take action and output a message of what change was made.
-/*ML: AFAIK, raster maps are automatically resampled to current region resolution. Again, this is standard GRASS procedure and probably does not have to be handled by your module(s)*/
-/* MM: I agree with ML. Standard practice is that the user decides on the region settings and modules adher to these */
+input map structure:
+ - each cell is an array, length to match number of maps in the input group
+ - TODO: how to store optional vector boundaries
+output map structure:
+ - also an array:
+	- segment assignment
+	- candidate flag: has it been check during this iteration loop
+	- assigned flag: has it been assigned to a segment (only if number of seeds < number of pixels)
+ 
 
-
-/* check input raster */
-Should the program recognize a mask, and only segment the unmasked area?  Will this cause a problem if the mask is "odd" shaped and/or disjoint?
-/*ML: Use of the mask (in GRASS a mask is defined by the existence of a raster map named MASK) would be very helpful, so yes, your module(s) should take an existing mask into account.*/
-/* MM: done automatically by Rast_get_row(). You will need to check for NULL values anyway and skip these cells. NULL values can have various reasons, e.g. a MASK or NULL in the original raster, see e.g. Landsat tiles */
-
-Does it make sense to require the area to be contiguous?
-/*ML: No, I don't think so. If you think about top-down multi-scalar segmentation, i.e. first coarse segmentation, then further segmentation of selected larger segments, non-contiguous areas make sens. Ex: Segment into coarse segments that allow to distinguish vegetation-covered areas from others, then segment these vegetation-covered areas further in order to distinguish types of vegetation.*/
-/*EM: This is different from my understanding, and sounds more like classification.  Wouldn't each further segmentation be handled as unique segements, and later grouped by the classification module into specific types of vegetation?  So if the area isn't contiguous, each island could be processed independently in an outer loop on the algorithm.*/
-/*ML: Exactly, this is why I said that IMHO areas should not be required to be contiguous. The classification argument was just an example to show why. ;-) */
-/*EM: OK! So a check for continueity could be used to lower memory requirements - if there are discontinuous areas, each can be processed sequentially. */
-/* MM: memory requirements are a separate issue and depend on the method you want to use to load raster maps and store temporary results */
-
-Check for null cells?  If found, what should be done?
-/* MM: skip these */
-
-
-End of validation checks
-if fail flag, exit with failure
-
 /*******************************************/
 /************* preprocessing ? *************/
 /*******************************************/
 
 Any preprocessing?
 
-If vector borders are provided, do we need to convert them to rasters?  Would lines and polygon's be treated the same?
+If complete polygons are given as boundaries:
+1. run segmentation for each polygon, mask rest of map
+2. run segmentation with polygons as mask (to segment anything not included in a polygon)
+
+I think this will apply to vector lines only, initially it was framed as being for polygons or lines.
+
 /*ML: vector to raster conversion is probably necesary. Pixels crosses by a line (polygon boundary or not) have to become part of a segment boundary.*/
 /*EM: hmm, OK, something else for discussion: These pixels that are on a vector line, should they eventually be included in one of the adjacent segments?  Is "segment boundary" just the edge pixels of the segment, or are the not included in any segment?*/
 /*ML: Here is where a difference comes into play between lines that are boundary polgons and lines as linear features. In my eyes pixels that are on boundarylines of polygons should be part of the segments that are internal to that boundary. Linear features would have to be treated differently. During discussions with colleagues we did have some difficulties finding actual use cases for linear features. Maybe we can start with only polygon features and if the use case of a linear features comes up try to integrate that then ?*/
 /*EM: But for polygons covering the entire map, there is a segment on either side of the polygon line.  If the line crosses the pixel, what should be done... It looks like this will not be a problem for multi-scalar segmentation, the polygons generated in a high level segmentation will be exactly between pixels.  This will only be an issue for polygons generated elsewhere, smoothed, at different resolution, etc.*/
 /* MM: You can not know where the polygons are coming from, therefore you have consider all cases or, better, come up with a general solution. You will need to clone (substantial) parts of the t.to.vect module if you want to rasterize polygons/boundaries. If you do not rasterize, you will need to check for a boundary/line whenever you evaluate a neighbor. This could be sped up a bit by selecting all boundaries/lines crossing the current 3x3 neighborhood. The spatial selection of vector features is fast, but doing that for every cell/3x3 neighborhood can substantially slow down the module. You will also need to check if the boundary is actually part of an area (not an invalid boundary). Then you will need to check if the focus cell is inside the area, if not, if the neighbor is inside the area. Even though some spatial information gets lost by rasterization, I tend to recommend rasterization. In any case, taking into account boundaries/lines can easily become the bulk of the 
 code, the most complex part of the code, and the most time-consuming component of the module. */
+/* EM: left the above discussion... unresolved.  One thought: instead of storing the output as a raster, maybe it should be first converted to a map, edges representing the neighbor relationship.  After we have a map, we could use the vector map to delete edges crossing the borders.  This is done once, afterwards we never calculate neighbors, only check for edges.  It seems this will be a very large memory structure to start with, but as the segmentation continues it will get smaller.
 
-If polygons constraints, check if all pixels are inside of polygons?
-/*ML: What do you mean by "all pixels" ?*/
-/*EM: 100% of the pixels in the raster input map.  But I supposed there will be the boundary pixels.  But my question here is more about what should be done if the input polygon constraints only enclose a portion of the map.  Should it be considered as a mask, and all pixels outside of the polygons are excluded from calculations? */
-/*ML: In my eyes mask and input seed regions are different things. You might have some areas for which you already have data (i.e. seed polygons), but this does not mean that you do not want to segment other areas. So, my answer to your question would be no.*/
-/*EM: OK, that makese sense.  So first all pixels inside of polygons can be checked, and finally all remaining pixels are segmented.  This final step will mask what is in the polygons, so they can't be merged.*/
 
-later: if allow polygons as seeds, get the centroids to use as seeds.
-/*ML: Innocent question: can't you use all pixels in the polygon as seeds ?*/
-/*EM: Yes, if the polygons are input as strict borders, then all pixels in the polygon will be seeds.  but IF the polygon's centroids are only being used to define a sparse set of seeds:  The basic workflow is to require the user to do the preprocessing and give points/centroids vector map as input.  This preprocessing ("later" = if time permits) step would include the step of extracting centroids from an existing polygon vector map inside of the module instead of making the user create a new vector map.*/
-/*ML: I think you should offer the option of inputting a vector map as seed together with the option of which types of features from that input map should be used. No need to do any extraction pre-processing. (See the use of the G_OPT_V_TYPE standard option, and further treatment of the choices made by the user, for example in d.vect or v.distance).*/
-/*EM: sounds good*/
-/* MM: Maybe easier: only use raster maps as seeds, users would need to run r.to.vect first if seeds come from a vector map. */
-/*ML: sounds like a very good idea to me: seed input only in the form of raster*/
-
-
 /*******************************************/
 /************ Processing ********************/
 /*******************************************/
 
+notes:
+If seeds < number of pixels, candidate pixels must meet the additional requirement of not yet being assigned to a segment.
 
-How to deal with tiling areas that are larger then fit in memory?  I assume I don't want disk I/O for every single pixel neighbor check, but also the size of the map may be large.  Don't want edge affects, and need awareness of all neighbors.  Maybe I/O for checking borders isn't too costly, can process one tile at a time, with disk I/O for just the borders.  Do 1 time step at each tile?
-/*ML: as Markus M told you, the segment library is probably your best choice for this.*/
-/* MM: when you use the segment library, you do not have to worry about the segment library's tile borders. It works fine for modules using the segment library, e.g. r.cost, r.walk, r.watershed */
 
-Data structure for candidate segments and already checked segments?
-In java, I'd think of a linked list, as elements are moved from one to the other, the overall memory requirements would be fixed.  BUT we might not have the entire map in memory.  Should we have a raster map with 1 for candidates and 0 for those that have already been checked on this iteration?
-/* In r.watershed a flag is used to indicate the status of each cell. This flag is of size Byte, i.e. 8 bit and each bit can be set/unset to indicate a certain status. This flag could be kept in memory or to be really safe also be stored in a separate structure using the segment library, maybe also together with the temporary output raster */
-
-To consider later: if we have point seeds (not all pixels) we need to also have a 3rd data structure, pixels not yet assigned to a region.  Will this process be different enough to have a different loop...or just have two different neighborhood select functions?  In this version, can two regions merge with each other, or only with unassigned seeds?
-/*ML: Not sure I understand all the issues related to seeds (I think that fixed boundaries should probably be a first priority), but I would guess that if the user has defined these seed regions as being of separate type, then they should probably not be merged.*/
-/* MM: as mentioned above, you could use a flag to indicate the status of a cell. When you start with a set of seeds < number of cells, you will grow regions from these seeds. In this case there is only a limited number of cells to be processed at each step. These cells could be stored in a FIFO list. Alternatively, you could use a minimum heap to store these cells with minimum defined by similarity (D below) and pick the cell with the highest similarity as the next cell to process. Just some ideas, to be refined when the code evolves I guess. */
-
-How to find irregular neighbors for irregular shaped segments?  If we have line constraints, the neighbor selection should not cross the borders.
-/*ML: As you will be working in raster, neighbourhood can be defined at pixel⁻level and so there are no "irregular" neighbors. The question of diagonal neighborhood is obviously open. How does eCognition handle this ?*/
-/*EM: Looks like it is an option to be selected by the user.  According to one power point presentation (not from eCognition, but how to use eCognition software: Normally use 4 neightbor, go to 8 neighbor only if pixels size is similar to feature size. */
-/* MM: I recommend to make it an option and test results of 4 vs 8 neighbors */
-
 If we have polygon constraints.  Outer for loop to process the image one polygon at a time.  (Need to check if all pixels are included in a polygon, otherwise process all those pixels last.)
 
 /*
@@ -227,30 +133,27 @@
 
 /* Similarity threshold T(t)... as t increases, threshold for similarity is lower. SPRING used: T(t) = T(0)alphat, where T(0) > ), t =0,1,2... and alpha <1 */
 
-/* MM: the following pseudocode should be converted to C code as soon as possible, since this is the core segmentation algorithm */
-
 For t  (until no merges are made?)
 
-	initialize candidate regions data structure (each region will be checked once on each pass)
-	save mean value vector and neighboring regions (Not sure why this needs to be calculated/saved ahead of time ??  Maybe SPRING has created a map data structure of what regions are neighbors? )
-/*ML: if you want to compare are region with neighbors in your while loop, maybe having a predefined neighborhood matrix makes it more efficient, instead of identifying neighbors at each step. However, if neighborhood is a defined number of neighboring pixels, then I don't see it making a difference.*/
-
-	While candidate region set is not empty (first pass this equals the seeds):
-		Compare Ri with neighbors (Question: should neighbors include or exclude those regions that were already matched?  Seems eCognition excludes all regions that have already been checked once on the iteration.)
-		/* MM: idea from network graph theory: if a nighbor is already matched but the new D is smaller
-		 * than the old D, reassign the neighbor because the algorithm found a better match */
-		If it exists, Rk is best neighbor if smallest D of all neighbors and and D < T.
-		Check Rk's neighbors.
-		IF (Ri is Rk's best neighbor) (so the agree, both are best match for each other)
-			merge
-			update segment values (mean)
-			remove from candidate region set. (give all "small" regions a chance to merge with best neighbor before growing larger regions)
+	Set candidate flag to true/1 for all pixels
+	
+	For each pixel that candidate flag is true:
+		function: find segment neighbors (Now we have list of pixels in current focus segment (Ri) and a list of neighbors)
+		Calculate similarity between Ri and neighbors
+		If it exists, Rk is both most similar, and also similarity is < T.
+		function: find segment neigbors of Rk
+		Calculate similarity between Rk and its neighbors
+		IF (Ri is Rk's best neighbor) (so they agree, both are best match for each other)
+			merge Ri and Rk: (probably as function?)
+				update segment values for all pixels in Ri+Rk (mean)
+				set candidate flag to false for all pixels in Ri+Rk
 			select next Ri
 			/* MM: I guess it will be important how to select the next candidate
 			 * (see above, FIFO or some kind of sorting) */
+			/* EM: I don't think the order matters: since the algorithm only accepts mutually best neighbors. */
 		Else
-			remove Ri from candidate region
-			Use Rk as next Ri
+			set candidate flag to false for all pixels in Ri
+			Use Rk as next Ri  /* note, this is the eCognition technique.  Seems this is a bit faster, we already have segment membership pixels
 	loop
 	
 	Were any merges made for this time step?
@@ -259,24 +162,136 @@
 
 Force a merge of regions that are below minimum size threshold (just merge with most similar neighbor, no comparison with similarity threshold)
 
+/*****************************************/
+/******Function: Find Segment Neighbors **/
+/*****************************************/
 
+1 1 2 3 4
+1 9 9 9 5
+9 9 9 6 5
+7 7 7 7 9
+
+If the current segment being checked is 9
+
+Desired results:
+If no seeds:  (can merge unassigned pixels and other segments)
+	For diagonals:
+		1, 2, 3, 4, 5, 6, 7
+	else
+		1, 2, 3, 4, 5, 7
+	
+else (starting from seeds, so only want single unassigned pixels as neighbors, no merges with other segments allowed)
+	for diagonals:
+		2, 3, 4, 6
+	else
+		2, 3, 6
+
+
+Method 1: (using "rasters")
+
+Input could be single pixel or list of pixels.
+
+Put input in "to be checked" stack
+Put input in "current segment" list
+put input in "don't check" list
+empty "neighbor" list
+While "to be checked" stack isn't empty:
+	pop
+	find pixel neighbors
+	with neighbors
+		if in "don't check" list
+			do nothing
+		else
+			put in "don't check" list
+			add to "to be checked" stack
+			if segment ID = current segment ID
+				add to "current segment" list
+			else
+				add to "neighbor" list
+	next neighbor
+loop...
+
+return: neighbor list, pixels in segment list
+
+neighbor list will be a listing of pixels that are neighbors?  Include segment numbers?  Only include unique segments?
+Maybe the most complete return would be a structure array, structure to include the segment ID and a list of points in it?  But the list of points would NOT be inclusive - just the points bordering the current segment...
+
+
+Method 2: (if build a map data structure at start of program)
+Using current pixel's segment ID's edges, return neighbors
+
+
+
+Existing GRASS functions don't seem to have this "neighbor of an area" concept.  Am I missing something, should something be adapted, should I design this as a library function, or just write it as a function for this program?
+
+r.neighbors:
+is looking at the neighborhood that is a specified distance from one pixel
+
+r.watershed
+looks at the 4 or 8 adjacent neighbors.  (can use this as basis for Find Pixel Neighbors function)
+
+r.buffer
+My current understanding:  needs to know the min/max row and column that the feature is found in, it then scans that entire square, if the pixel is part of the feature, then checks distances around it.  Seems looking for min/max for every segment number will be time consuming.  Would need data structure to accomodate remembering what pixels are in each segment to use this.
+
+
+/*****************************************/
+/******Function: Find pixel Neighbors ****/
+/*****************************************/
+
+will use function pointer based on input, to select 4 or 8 neighbors
+
+
+/*****************************************/
+/******Function: calculate simularity ****/
+/*****************************************/
+
+Initially only Euclidean Distance
+
+NOTE: this is not a distance function of the coordinates at all!
+
+sqrt of the sum of squared differences between each of the input raster maps
+
+
 /****************************************/
 /************ Output ********************/
 /****************************************/
 
+renumber segments to have a sequential (1,2,3...) numbering?
 
-output raster (with segment ID as raster data) is written as we go?  Or maybe it would better to have in a temp map, and write a fresh one at the end (so segment ID numbers are continuous?)
-/* MM: you need to write the output raster at the end because Rast_put_row() expects rows to be written out in order: Rast_put_row() does not take row as argument */
+output raster convert segmentation file or RAM data structure to grass raster using Rast_put_row()
 
+G_message: total number of segments made.
 
-output vector and generate statistics
-(existing GRASS module to create polygons for each segment from the raster map)
-/*ML: for this and the following - as already mentioned, maybe the generation of statistics should be done in a separate module*/
-/* MM: I agree */
+/*******************************/
+/********** tidy up ************/
+/*******************************/
 
+free memory, delete temp files
+
+
+exit - success!
+
+
+
+###############################
+###############################
+
+Statistics for the segmentation calculated in a seperate module
+
+Segmentation output = raster
+statistics module input and output = vector. The user will have to run r.to.vect between the two.
+
+Name?
+i.seg.stats
+i.segment.stats
+r.stats.seg
+
+-e  Maybe an option for basic vs. extended output statistics 
+
+
 calculate statistics to be saved in data table for the vector map
 
-one vector map of segments per hierarchy level with a series of attributes (not all of these attributes should probably be calculated directly be the segmentation module)
+one vector map of segments per hierarchy level with a series of attributes (I think this request would be handled by running the module for each hierarchy level?  Or do we need to have an attribute for hierarchy level stored with a single vector map?)
 
 spectral attributes:
 per spectral band: mean, min, max, skewness
@@ -293,14 +308,3 @@
 depending on segmentation algorithm: raster map indicating for each pixel the probability of belonging to the segment it was put into, i.e. some measure of reliability of results  (For region growing - should this be the similarity measure when it was merged?  Or similarity measure of the pixel compared to the average?)
 /*ML: Not sure, but I would think that similarity between pixel and average of region it belongs to might be a good choice. Am not a specialist in statistics, but maybe it is possible to translate this into some form of probability of really "belonging" to that region (cf i.maxlik)*/
 /* MM: I guess here it is important to not confuse classification with segmentation */
-
-/*******************************/
-/********** tidy up ************/
-/*******************************/
-
-free memory, delete temp files
-
-output to screen (timing, messages - how much needs to be done in my program and how much is handled by GRASS infrastructure?)
-/* MM: In general, keep the amount of messages low and don't be too technical (see r.terraflow for a bad example). OTOH, liberal use of G_debug() is very welcome! At the end of the module, the number of segments created would be nice to have. Timing is done automatically by the GUI, otherwise it can be done on CLI with time i.segment ... */
-
-exit - success!

Modified: grass-addons/grass7/imagery/i.segment/parse_args.c
===================================================================
--- grass-addons/grass7/imagery/i.segment/parse_args.c	2012-06-04 18:49:30 UTC (rev 51973)
+++ grass-addons/grass7/imagery/i.segment/parse_args.c	2012-06-04 21:15:02 UTC (rev 51974)
@@ -15,12 +15,22 @@
 {
     /* reference: http://grass.osgeo.org/programming7/gislib.html#Command_Line_Parsing */
 
-    struct Option *group, *subgroup, *seeds, *output, *method, *threshold;	/* Establish an Option pointer for each option */
+    struct Option *group, *seeds, *output, *method, *threshold;	/* Establish an Option pointer for each option */
     struct Flag *diagonal;	/* Establish a Flag pointer for each option */
 
+	/* for the opening files portion */
+    struct Ref Ref;		/* group reference list */
+    int *in_fd, *seg_in_fd, seg_out_fd;
+    RASTER_MAP_TYPE data_type;
+    int n, row, nrows, ncols, srows, scols, seg_in_mem;
+    void *inbuf;
+    const char *in_file[10], *out_file;	/* max 10 rasters in imagery group, until figure out how to do this dynamically */
+
+
+
     group = G_define_standard_option(G_OPT_I_GROUP);
 
-    subgroup = G_define_standard_option(G_OPT_I_SUBGROUP);
+	/* deleted subgroup line, but still appears in input form */
 
     /* OK to require the user to create a group?  Otherwise later add an either/or option to give just a single raster map... */
 
@@ -38,11 +48,6 @@
     seeds->description = _("Optional raster map with starting seeds.");
 
     output = G_define_standard_option(G_OPT_R_OUTPUT);
-    //seems API handles this part ?
-    //~ output->key = "output";
-    //~ output->type = TYPE_STRING;
-    //~ output->required = YES;
-    //~ output->description = _("Name of output raster map.");
 
     //TODO: when put in a new raster map, error message:
     //~ Command 'd.rast map=testing at samples' failed
@@ -75,34 +80,26 @@
 	("Use 8 neighbors (3x3 neighborhood) instead of the default 4 neighbors for each pixel.");
 
 
-    //~ G_debug(1, "testing debug!");
-    //~ When put this in, get an error (only when DEBUG is set, if not set, it runs fine)
-    //~ 
-    //~ Error box:
-    //~ Unable to fetch interface description for command 'i.segment'.
-    //~ Details: D1/1: testing debug!
-
-    //~ G_debug(1, "For the option <%s> you chose: <%s>",
-    //~ input->description, input->answer);
-    //~ 
-    //~ G_debug(1, "For the option <%s> you chose: <%s>",
-    //~ seeds->description, seeds->answer);
-    //~ 
-    //~ G_debug(1, "For the option <%s> you chose: <%s>",
-    //~ output->description, output->answer);
-    //~ 
-    //~ G_debug(1, "For the option <%s> you chose: <%s>",
-    //~ method->description, method->answer);
-    //~ 
-    //~ G_debug(1, "For the option <%s> you chose: <%s>",
-    //~ threshold->description, threshold->answer);
-    //~ 
-    //~ G_debug(1, "The value of the diagonal flag is: %d", diagonal->answer);
-
-
     if (G_parser(argc, argv))
 	exit(EXIT_FAILURE);
 
+    G_debug(1, "For the option <%s> you chose: <%s>",
+    group->description, group->answer);
+    
+    G_debug(1, "For the option <%s> you chose: <%s>",
+    seeds->description, seeds->answer);
+    
+    G_debug(1, "For the option <%s> you chose: <%s>",
+    output->description, output->answer);
+    
+    G_debug(1, "For the option <%s> you chose: <%s>",
+    method->description, method->answer);
+    
+    G_debug(1, "For the option <%s> you chose: <%s>",
+    threshold->description, threshold->answer);
+    
+    G_debug(1, "The value of the diagonal flag is: %d", diagonal->answer);
+
     /* Validation */
 
     /* use checker for any of the data validation steps!? */
@@ -127,17 +124,10 @@
 
 
     /* Open Files (file segmentation) */
-    G_verbose_message("Checking image (sub)group...");
+    G_verbose_message("Checking image group...");
     /* references: i.cost and http://grass.osgeo.org/programming7/segmentlib.html */
 
 
-    struct Ref Ref;		/* subgroup reference list */
-    int *in_fd, *seg_in_fd, seg_out_fd;
-    RASTER_MAP_TYPE data_type;
-    int n, row, nrows, ncols, srows, scols, seg_in_mem;
-    void *inbuf;
-    const char *in_file[10], *out_file;	/* max 10 rasters in imagery group, until figure out how to do this dynamically */
-
     /* int buf[NCOLS]; */
     /* c question... will using void data type be the right way, until I know what the data type is? */
     /* that was in the developer's manual... but would void *buf also work? */
@@ -146,18 +136,17 @@
 
     /* ****** open the input rasters ******* */
 
-    /* i.smap/openfiles.c  lines 17-23 checked if subgroup had maps, does API handles the checks?
-       can no subgroup be entered, just a group? */
+    /* i.smap/openfiles.c  lines 17-23 checked if subgroup had maps, does API handles the checks? */
 
-    if (!I_get_subgroup_ref(group->answer, subgroup->answer, &Ref))
+    if (!I_get_group_ref(group->answer, &Ref))
 	G_fatal_error(_
-		      ("Unable to read REF file for subgroup <%s> in group <%s>"),
-		      subgroup->answer, group->answer);
+		      ("Unable to read REF file for group <%s>"),
+		      group->answer);
 
     if (Ref.nfiles <= 0)
 	G_fatal_error(_
-		      ("Subgroup <%s> in group <%s> contains no raster maps"),
-		      subgroup->answer, group->answer);
+		      ("Group <%s> contains no raster maps"),
+		      group->answer);
 
     /* open input group maps for reading */
 
@@ -209,7 +198,7 @@
     seg_in_fd = G_malloc(Ref.nfiles * sizeof(seg_out_fd));	/* need sizeof( integer ) */
     G_verbose_message("Creating temporary data files...");
     for (n = 0; n < Ref.nfiles; n++) {
-	seg_in_fd[n] = creat(&in_file[n], 0666);
+	seg_in_fd[n] = creat(in_file[n], 0666);
 	if (segment_format(seg_in_fd[n], nrows, ncols, srows, scols, sizeof(data_type)) != 1)	/* TODO: this data_type should be from each map */
 	    G_fatal_error("can not create temporary file");
 	close(seg_in_fd[n]);	/* why close when we just reopen again?  Different access mode between creat and open ? */
@@ -224,7 +213,7 @@
     /* Open and initialize all segment files */
     G_debug(1, "Initializing temporary data files...");	/* program dies sometime after this point, and before line 234 */
     for (n = 0; n < Ref.nfiles; n++) {
-	seg_in_fd[n] = open(&in_file[n], 2);	/* TODO: second parameter here is different in many places... */
+	seg_in_fd[n] = open(in_file[n], 2);	/* TODO: second parameter here is different in many places... */
 	if (segment_init(&files->bands_seg[n], seg_in_fd[n], seg_in_mem) != 1)
 	    G_fatal_error("can not initialize temporary file");
     }