[mapserver-commits] r10989 - trunk/docs/en/development/rfc

Sat Feb 12 13:30:17 EST 2011

Author: tamas
Date: 2011-02-12 10:30:17 -0800 (Sat, 12 Feb 2011)
New Revision: 10989

Modified:
   trunk/docs/en/development/rfc/ms-rfc-68.txt
Log:
Rework in MS-RFC-68

Modified: trunk/docs/en/development/rfc/ms-rfc-68.txt
===================================================================

--- trunk/docs/en/development/rfc/ms-rfc-68.txt	2011-02-12 04:39:22 UTC (rev 10988)
+++ trunk/docs/en/development/rfc/ms-rfc-68.txt	2011-02-12 18:30:17 UTC (rev 10989)
@@ -13,205 +13,232 @@
 :Id: $Id$
 
 Description: This RFC proposes an implementation for creating a new data
-provider (CONNECTIONTYPE=CLUSTER) which provides an option to combine multiple features
-from multiple layers to single (aggregated) features.
+providers (CONNECTIONTYPE=COMBINE and CONNECTIONTYPE=CLUSTER) which provide an option 
+to combine features from multiple layers into a single layer and to cluster multiple 
+features from a layer to single (aggregated) features based on their relative positions.
 
-Background
-~~~~~~~~~~
+1. Overview
+-----------
 
 In order to make the maps perspicuous at a given view, we may require to limit
 the number of the features rendered at neighbouring locations which would normally
 overlap each other. Currently there's no such mechanism in MapServer which would
 prevent from the symbols to overlap based on their relative locations. In a feasible
 solution we should provide rendering the isolated symbols as is, but create new 
-(combined) features for those symbols that would overlap in a particular scale.
-The combined features can the be labeled based on their aggregate attributes.
+(clustered) features for those symbols that would overlap in a particular scale.
+We would like to support clustering features from multiple layers as well. For this
+reason we implement a separate layer data source (CONNECTIONTYPE=COMBINE) to represent 
+features from multiple layers in a single layer then use this single layer as the 
+data source of another layer (CONNECTIONTYPE=CLUSTER) which is responsible to perform
+the clustering process and provide the clustered features at the output.
+The following example will show how the layers are chained together to provide the final result:
 
-General principles of the solution
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+::
 
-This addition will require a mechanism to compare the features from one or more
-layers (called as 'source layers') and form clusters based on their relative positions
-in the current scale. The combined features (clusters) are rendered at averaged
-positions calculated from the related features and contain aggregated attributes 
-(like the count of the features belonging to this cluster). This aggregation layer 
-will be implemented as a new layer type (with CONNECTIONTYPE=CLUSTER).
+  LAYER
+    CONNECTIONTYPE CLUSTER
+    NAME cluster
+    PROCESSING "SOURCELAYER=combine"
+    ...
+  END
+  LAYER
+    CONNECTIONTYPE COMBINE
+    NAME combine
+    PROCESSING "SOURCELAYER=layer1,layer2"
+    ...
+  END
+  LAYER
+    CONNECTIONTYPE OGR
+    NAME layer1
+    ...
+  END
+  LAYER
+    CONNECTIONTYPE SHAPE
+    NAME layer2
+    ...
+  END
+  
+The order of the layers doesn't affect the result of the data processing, but it is suggested to set the
+visibility of the source layers or the combine layers to false in order to avoid the duplication features on the map.
 
-Each source layer will be marked with the following processing option:
+2. Combining features from multiple layers
+------------------------------------------
 
-::
+This functionality will be implemented as a separate layer data source (CONNECTIONTYPE=COMBINE) which will
+operate in the following way:
 
-  PROCESSING "TARGETLAYER=[name of the aggregate layer]"
+1) In LayerOpen it will open all of the source layers specified in the SOURCELAYER processing option
+2) The LayerWhichShapes call is simply delegated to the underlying layers
+3) LayerNextShape will iterate through the layers and call LayerNextShape for the subsequent shapes
+   The layer index is assigned to the tileindex of the returned shapes. If we finish retrieving the
+   shapes from one layer we start retrieving the features from the next source layer.
+4) LayerGetShape will retrieve the layer based on the tile index and call LayerGetShape of this layer
 
+The union layer and the source layers must have the same geometry type otherwise an error is generated in
+the LayerOpen call.
 
-This will ensure that the source layers can be identified during the clustering process. The vtable
-of the source layers will also be modified by overriding the functions implemented by the
-cluster data provider. This will allow the cluster provider to have enough control which features
-should be rendered from the source layer and from the aggregate layer.
+2.1 Handling the layer attributes (items)
+-----------------------------------------
 
-The aggregate (cluster) layer will use futher processing options to control the clustering
-operation, like:
+In general the source layers must provide those attributes which are required when rendering the union layer, however 
+the underlying data may contain further attributes, which are not used when fetching the data from the original source.
+When all attributes are requested (in the query operations) then the union layer will provide only some aggegated
+attributes (like the layer name or the group name of the source layer the feature belongs to). 
+The set of the items can manually be overridden (and further attributes can be exposed) by using the following
+processing option:
 
 ::
 
-  PROCESSING "CLUSTERMAXDISTANCE=20"  # the maximum distance allowed between the features without forming a cluster
-  PROCESSING "CLUSTERMAXCOUNT=100"    # the maximum number of the features in a cluster 
+  PROCESSING "ITEMS=itemname1,itemname2,itemname3"
 
+At this stage of the development, the driver will expose the following additional attributes:
 
-The desired functionality will in fact require to split the drawing process into 2 phases.
+1) Combine:SourceLayerName - The name of the source layer the feture belongs to
+2) Combine:SourceLayerGroup - The group of the source layer the feture belongs to
 
-1) Data preprocessing phase
-2) Rendering phase
+2.2 Projections
+---------------
 
-Data preprocessing phase
-~~~~~~~~~~~~~~~~~~~~~~~~
+It is suggested to use the same projection of the aggregate layer and the source layers. The layer provider
+will anyway support transforming the feature positions between the source layers and the union layer.
 
-The preprocessing phase would be triggered when MapServer starts accessing the cluster layer or any 
-of the source layers (either when drawing or querying) in a WhichItems call. In this phase the source layers 
-(marked with the 'TARGETLAYER' processing option) will be preprocessed. The features of the current extent
-are retrieved from the original data source and passed through a clustering process with the following steps: 
+2.3 Handling classes and styles
+-------------------------------
 
+We can define the symbology and labelling of the combine layers in the same way as any other layer by specifying 
+the classes and styles. In addition we will also support the STYLEITEM AUTO option for the combine layer, which
+is essential if we want to display the features in the same way as with the source layers. The source layers 
+may also use the STYLEITEM AUTO setting if the underlying data soure provides that.
+
+2.4 Query processing
+--------------------
+
+The queries on the combined layer will behave the same like for the other layers. All of the source 
+layers are kept open until the combine layer is open. This will provide the single pass query to
+work in case if the source layer supports it.
+
+3. Clustering the features
+--------------------------
+
+This functionality will be implemented as a new layer data source (CONNECTIONTYPE=CLUSTER) which will
+operate in the following way:
+
+In the LayerWhichShapes call we start a preprocessing phase. The features of the current extent
+are retrieved from the source layer and passed through a clustering process with the following steps: 
+
 1) For each feature we create a tentative cluster and create the aggregate attributes 
    (like the feature count and the average position)
 2) We will retrieve all the neighbouring shapes (that has already been retrieved earlier) by using a quadtree
    (implemented in maptree.c) and then update a feature counts and the average positions at each intersecting cluster.
 3) In a second turn we evaluate the tentative clusters based on their feature count and the offset of the 
    average position related to the initial position.
-4) From the best ranking clusters we create new features and add them to the 'features' collection of the aggregate layer
+4) From the best ranking clusters we create new features and add them to the 'features' collection of the cluster layer
    (as inline features)
-5) The features from remaining tentative clusters (containing individual features only) will be added to the 
-   features collection of their source layers.
+   
+The preprocessed features are served from the 'features' collection in the same way as the inline layer does it.
+   
+The cluster layer will use futher processing options to control the clustering operation, like:
 
-Rendering phase
-~~~~~~~~~~~~~~~
+::
 
-The rendering phase will be implemented in the same way as it stands for the inline layers now, assuming that the
-'features' collection has already been populated.
+  PROCESSING "CLUSTERMAXDISTANCE=20"  # the maximum distance allowed between the features without forming a cluster
+  PROCESSING "CLUSTERMAXCOUNT=100"    # the maximum number of the features in a cluster
 
-Projections
-~~~~~~~~~~~
+3.1 Handling the feature attributes (items)
+-------------------------------------------
 
-It is suggested to use the same projection of the aggregate layer and the source layers. The clustering process
-will anyway support transforming the feature positions between the layers. The clustering process itself is
-happening in the projection of the aggregate layer.
+The items handling approach of the cluster layer will be the same as described for the combine layer
+At this stage of the development the driver will expose the following additional attributes:
 
-Handling the feature attributes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+1) Cluster:FeatureCount - The number of the features belonging to this cluster
 
-By default we will retrieve all items from the original data sources during the clustering process. msLayerWhichItems will
-also be modified to always get all items for the cluster layer and the source layers as well.
-We should however mention that the set of the items can manually be overridden by using a processing option as follows:
+When the source layer is a combine layer, then the related additional attributes 
+(Combine:SourceLayerName and Combine:SourceLayerGroup) will also be available
 
-::
+3.2 Projections
+---------------
 
-  PROCESSING "ITEMS=itemname1,itemname2,itemname3"
+It is suggested to use the same projection of the cluster layer and the source layers. The cluster layer data provider
+will anyway support transforming the feature positions between the layers. The clustering process itself is
+happening in the projection of the cluster layer. 
 
-In this case the user must manually select all of those items which would normally be required when rendering the layer.
+3.3 Handling classes and styles
+-------------------------------
 
-Query processing
-~~~~~~~~~~~~~~~~
+Since all of the features (including the clusters and the individual features) are served from the same layer
+it is important to support the STYLEITEM AUTO option for the cluster layer as well. This will ensure that the
+individual features will get the same look as it was displayed at the source layer. The STYLEITEM AUTO 
+implementation will also provide the assignment the clustered features to the local class definitions.
 
-According to the TEMPLATE parameter of the aggregate layer we will provide 2 query modes:
- 
-1) If the TEMPLATE parameter of the aggregate layer is not set, then the query operations will be redirected
-   to the data sources and the query would retrieve the features provided by the original sources. No features 
-   would be retrieved from the aggregate layer in this case.
-2) If the TEMPLATE parameter of the aggregate layer is set, then only the combined features and the individual features
-   would be retrieved (as shown on the map).
+3.4 Query processing
+--------------------
+
+The user may select whether the query is happening on the cluster layer or the underlying source layer (by
+setting the template parameter for any of these layers). This doesn't require any further implementation.
    
 When drawing the query map, the background is drawn by using the copy to the original layer which would in fact
 copy the inline features as well, so there's no need to restart the clustering process during the rendering.    
 
-Handling classes and styles
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+3.5 Preserving the features between rendering sessions
+------------------------------------------------------
 
-For each source feature msShapeGetClass will be called to identify and preserve their classindex throughout 
-the clustering process (features with classindex = -1 will be omitted).
-The aggregate layer can be classified based on their supported attributes. In addition to the feature count
-attribute we intend to support further attributes, like the classname, layername and groupname in case if the 
-features belong to the same class, layer or layergroup actually.
-We don't intend to support STYLEITEM AUTO which would normally require to store the classes along with 
-the features. However in most cases we can work around this limitation by retrieving the feature style 
-as an attribute and then use STYLEITEM "[style attribute]" when configuring the source layers (as per MS-RFC-61).
-
-Preserving the features between rendering sessions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
 We intend to preserve the generated features as long as possible. We consider to rebuild the features in the 
 WhichItems call only when the map extent is changing (the parameters will be stored in the DATA section 
-of the cluster layer)
+of the cluster layer). We may however prevent from writing the inline features in writeLayer.
 
-Using multiple aggregate layers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+4. Implementation Details
+-------------------------
 
-It is supported to use multiple aggregate layers in the same map, which would allow to create separate
-sets of the combined features for particular groups of layers. In this case the features would not be 
-negotiated between these partitions would be allowed to overlap. 
-However we won't support to set multiple TARGETLAYER for the same source layer which would require 
-merging the features coming from the different cluster sets.
-
-Implementation Details
-~~~~~~~~~~~~~~~~~~~~~~
-
 In order to implement this enhancement the following changes should be made in the MapServer codebase:
-
-1) Modify msInitializeVirtualTable (in maplayer.c) to call the current method 
-   (renamed as msInitializeVirtualTableOriginal), and then override the vtable based on the existence 
-   of the TARGETLAYER processing option. Having these 2 functions in place, the cluster layer provider 
-   will easily switch to the source vtable during the preprocessing.
    
-2) Modify msLayerWhichItems (in maplayer.c) to retrieve all items for these layers
-   
-3) Expose the functions treeNodeCreate and treeAddShapeId from maptree.c to allow using the quadtree
+1) Expose the functions treeNodeCreate and treeAddShapeId from maptree.c to allow using the quadtree
    implementation during the clustering process.
-   
-4) Modify the lexer to interpret this new connection type.
+2) Modify the lexer to interpret the connection types (COMBINE and CLUSTER).
+3) Implement mapcombine.c containing the code of the combine layer data source.
+4) Implement mapcluster.c containing the code of the cluster layer data source.
 
-5) Implement mapcluster.c containing the code of the cluster layer data source.
+4.1 Files affected
+------------------
 
-Files affected
-~~~~~~~~~~~~~~
-
 The following files will be modified/created by this RFC:
 
 ::
 
   maptree.c
   maptree.h
-  maplayer.c
   maplexer.l
   mapserver.h
   Makefile.vc
   Makefile.in
-  mapcluster.c  (new)
+  mapcombine.c (new)
+  mapcluster.c (new)
 
-MapScript Issues
-~~~~~~~~~~~~~~~~
+4.2 MapScript Issues
+--------------------
 
 There's no need to modify the MapScript interface within the scope of this RFC.
 
-Backwards Compatibilty Issues
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+4.3 Backwards Compatibilty Issues
+---------------------------------
 
 This change provides a new functionality with no backwards compatibility issues being considered.
 
-Further Considerations
-~~~~~~~~~~~~~~~~~~~~~~
+4.4 Further Considerations
+--------------------------
 
-Should we provide a dedicated keyword for TARGETLAYER instead of a processing key?
 Should we prevent from writing the inline features in writeLayer?
 
-Bug ID
-~~~~~~
+5. Bug ID
+---------
 
-The ticket for RFC-68 can be found here.
+The ticket for RFC-68 (containing implementation code) can be found here.
 
-Bug XXXX_
+Bug 3674_
  
-.. _XXXX: http://trac.osgeo.org/mapserver/ticket/XXXX 
+.. _3674: http://trac.osgeo.org/mapserver/ticket/3674 
 
-Voting history
-~~~~~~~~~~~~~~
+6. Voting history
+-----------------
 
 None