[mapserver-commits] r9079 - branches/branch-5-4/docs/development/rfc
svn at osgeo.org
svn at osgeo.org
Thu Jun 4 00:01:26 EDT 2009
Author: sdlime
Date: 2009-06-04 00:01:26 -0400 (Thu, 04 Jun 2009)
New Revision: 9079
Modified:
branches/branch-5-4/docs/development/rfc/ms-rfc-52.txt
Log:
Updated...
Modified: branches/branch-5-4/docs/development/rfc/ms-rfc-52.txt
===================================================================
--- branches/branch-5-4/docs/development/rfc/ms-rfc-52.txt 2009-06-04 04:00:23 UTC (rev 9078)
+++ branches/branch-5-4/docs/development/rfc/ms-rfc-52.txt 2009-06-04 04:01:26 UTC (rev 9079)
@@ -4,7 +4,7 @@
:Date: 2009/03/08
:Authors: Steve Lime
:Contact: sdlime at comcast.net
-:Last Edited: 2009/03/08
+:Last Edited: 2009/06/03
:Status: Draft
:Version: MapServer 6.0
:Id:
@@ -16,10 +16,10 @@
Presently MapServer supports a very flexible query mechanism that utilizes two
passes through the data. This works by caching a list of feature IDs (pass one)
-and then a second pass through the features for presentation (template, drawing,
-or retrieval via MapScript. The obvious problem is the performance hit incurred
-from the second pass. The real pain is that the msLayerGetShape() function, as
-implemented, provides random access to the data which can be very expensive for
+and then a second pass through the features for presentation: templated output,
+drawing, or retrieval via MapScript. The obvious problem is the performance hit
+incurred from the second pass. The real pain is that the msLayerGetShape() function,
+as implemented, provides random access to the data which can be very expensive for
certain drivers.
Technical Solution
@@ -30,7 +30,8 @@
a true single-pass, you wouldn't have to go back to the original driver twice.
However, it could lead to large memory consumption with even moderately-sized
datasets. Multiple clients accessing services at the same time would only
-compound the problems.
+compound the problems. Some testing has confirmed this method to be no faster
+and probably even a bit slower than option 3 below.
2. Another solution would be fold much of the query pre- and post-processing
code into the msLayerWhichShapes() and msLayerNextShapes() functions so that
@@ -38,7 +39,7 @@
has let us to conclude that a true single pass is not possible in some cases.
For example, GML requires a result set envelope be written out before writing
individual features. There's no way to get that initial envelope without a
-pass through the features. It's simply not worth the investment in time...
+pass through the features.
3. A final solution would be to change how the msLayerGetShape() function
behaves. We prepose changing the behavior of that function to provide random
@@ -49,42 +50,124 @@
Under this last solution data drivers would need to do two things:
- * update the population shapeObj index property (long int) with a value that
- will allow msLayerGetShape() to randomly access a result
+ * update the population of the shapeObj index property (long int) with a value
+ that will allow msLayerGetShape() to randomly access a result
* update the driver-specific version of msLayerGetShape() to retrieve a shape
- from the results created in msLayerWhichShapes()
+ from the results created in msLayerWhichShapes()
The query functions would need to:
* not close the layer when finished with a query (we assume that users will
- want to do something with the results)
+ want to do something with the results)
* allow msLayerWhichItems() to retrieve ALL items so that the retreived
- shapes are presentation-ready (draw, template, or ...)
+ shapes are presentation-ready (draw, template, or ...)
+The presentation functions:
+
+ * refrain from calling msLayerOpen(), msLayerWhichItems(), msLayerWhichShapes()
+ since that has already been done in the query functions
+
+This solution has been piloted in the single-pass sandbox with very promising
+results. In some cases queries run orders of magnitude faster. One positive side
+effect is that primary keys need not be used to retrieve features from the result
+set. It is the drivers responsibility to provide data to uniquely identify a
+row in the result set.
+
Backwards Compatability Issues
------------------------------------------------------------------------------
-This solution preserved 95% of the current functionality. Because we are proposing
+This solution preserved 95% of the current functionality. However, we are proposing
to change the behaviour of msLayerGetShape() to be specific to query processing.
While this has always been the intentional use, there's nothing to stop a user
-from using that method (in MapScript) in other ways.
+from using that method (in MapScript) in other ways. This would have to be well
+documented. That said, a typical MapScript query and process results operation
+would be unchanged.
-A typical MapScript query and process results operation would be unchanged.
-
One casualty would be the query save/read functions. Since the processing of
a set of results would be specific to dataset result handle it won't be
-possible to get back to a result once a layer is ultimately closed.
+possible to get back to a result once a layer is ultimately closed. A proposed
+solution to this problem is presented next.
+Query File Support
+------------------------------------------------------------------------------
+Query files have provided a means of saving the results of a query operation for
+use in subsequent map production. The series of indexes gathered during a query
+are written to disk and read later to be used to access the data a feature at a
+time. With the proposed changes this simply won't work with RDBMS data sources.
+It becomes necessary to instead recreate the result set but re-executing the
+query. Problem is, there's no easy way to serialize query parameters.
+
+I propose creating a new object, a queryObj, to store the various parameters
+associated with MapServer queries. It might look something like:
+
+typedef struct {
+ int type; /* By rect, point, shape, attribute, etc... Types match the query functions. */
+ int qlayer; /* used by all functions */
+
+ rectObj *rect; /* used by msQueryByRect() */
+
+ char *qitem; /* used by msQueryByAttribute() */
+ char *qstring; /* used by msQueryByAttribute() */
+
+ ...and so on...
+} queryObj;
+
+A single queryObj would hang off a mapObj and the mapObj would be the sole parameter
+passed to the various query methods. MapServer C code, primarily the CGI and OGC
+interfaces would simply populate the appropriate queryObj members and call the correct
+query function.
+
+MapScript would be unchanged. The wrappers for the various query functions need only
+use the user supplied parameters to populate the queryObj and then call the query
+function. The queryObj would be immutable.
+
+By storing all the information in a single store it should be easily serialized to
+disk. When read, the reconsituted queryObj could then be used to re-execute the
+appropriate query. The msSaveQuery() and msLoadQuery() function signatures would
+remain "as is" although the internals would change.
+
Files Impacted
------------------------------------------------------------------------------
-
* driver files: changes to shape fetching code
-* maptemplate.c: don't open/close a layer
+* maptemplate.c: don't open/close a layer when presenting results
-* mapgml.c: don't open/close a layer
+* mapgml.c: don't open/close a layer when presenting results
* mapdraw.c: don't open/close a layer IF if drawing a query map
-* maputil.c: refactor msLayerWhichItems()
+* maplayer.c: refactor msLayerWhichItems()
+
+* mapquery.c: re-work msSaveQuery() and msLoadQuery(), change query functions
+ to take a lone mapObj as input, add msInitQuery() and msFreeQuery() functions
+
+* mapserv.c: populate map->query before calling query functions
+
+* mapwxs.c (various): populate map->query before calling query functions
+
+* mapfile.c: leverage msInitQuery() and msFreeQuery() functions
+
+* mapserver.h: define queryObj, add to mapObj
+
+* mapscript (various): update map/layer query methods to populate a queryObj
+
+* others? (mapcopy.c for one)
+
+Although a number of files are impacted the changes in general are relatively simple.
+
+Unknowns
+--------------------------------------------------------------------------------
+To date only shapefiles, PostGIS and Oracle drivers have been tested with this new
+scheme, all with positive results. Even shapefiles showed a performance improvement
+simply due to incurring the overhead of opening files just once. It's not clear how
+OGR, SDE and raster queries will be impacted. I hope the owners of those drivers can
+comment further.
+
+MapServer has supported a rather obscure query method called queryByIndex() as basically
+a wrapper to msLayerGetShape(). This change may render that method obsolete but more
+checking need be done.
+
+Voting History
+--------------------------------------------------------------------------------
+None
More information about the mapserver-commits
mailing list