[Mapserver-dev] Query efficiency

Thu Mar 4 14:02:00 EST 2004

Hi Frank: I tend to favor this approach in the short term too. We could
put some
limits in place to control the maximum cache size I suppose. Maximum
number of
features would be one way. How easy is it to get the cummulative size
(in Kb) of 
a linked list of shapeObj's? The nice thing is the code to manage a
feature list 
already exists.

I spoke to why queries can't be processed in pass in some of the other
emails so
I won't go into details unless you really want them. Bottom line is
that a query 
result set can be used in lots of ways, through templates, in
MapScript, to make
maps (QueryMap) and even to direct other queries (mode=FeatureQuery).

Steve

Stephen Lime
Data & Applications Manager

Minnesota DNR
500 Lafayette Road
St. Paul, MN 55155
651-297-2937

>>> Frank Warmerdam <warmerdam at pobox.com> 3/3/2004 2:01:05 PM >>>
Steve Lime wrote:
> Sean: Queries work by first generating a candidate result set and
then
> operating on that result set within MapServer (applying classes
etc...).
> Queries cannot be completely executed in the underlying RDBMS (as
the
> code sits). So there's this disjoint relationship between the result
set
> and the database. The fix would be to enable all queries in a vendor
> specific way and then maintain access to the result set using the
> msLayerNextShape() function. 
> 
> The current code gives us very consistent results between
datasources
> because the same algorithms (good or bad) are used for everything.
> Unfortunately is doesn't let us tap into the power of the database
> except for attribute queries.

Steve,

My first pass opinion is that all the results of a query should be
held in memory as shapeObj's and that memory cache reused for
subsequent parts
of the query operation.  This would ensure consistent behaviour for all
drivers,
but eliminate the extra pass queries that occur now and that can have
pretty
awful performance characteristics in some cases.

The obvious downside to this approach is that the memory cache of
shapeObj's could potentially be large.  Even large enough to
potentially
bring the system to it's knees in a worst case.  However, rather than
moving
alot more logic down into the datasources, or trying to cache to disk,
I think
it would be better to just provide better tools for the query, and to
provide
some sort of "maxresults" option to control the number of shapeObj's
that will
be collected as a query result.

This sort of change would be quite simple, and very fast for cases
where the
result set isn't gargantuan.

However, there are still aspects of the query architecture as it
stands
that I don't really understand.  I'm not sure why a the shapes from a
result
set can't be fully processed on the first pass.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for
Rent