MS RFC 22a: Feature cache for long running processes and query processing (C

Tamas Szekeres szekerest at GMAIL.COM
Wed Jun 13 19:45:49 EDT 2007


2007/6/13, Steve Lime <Steve.Lime at dnr.state.mn.us>:
> >
> > I would also allow to specify the operation parameters as feature items,
> > like
> > METADATA
> >     "distance"  "[distanceitem]"
> > END
>
> I don't think I like attribute binding here. For one thing you don't know what type
> distance is.
>

OK, I won't force that possibility.

> >
> > It seems I cannot avoid implementing the query capability in this
> > provider so as to keep the implementation obvious and update this RFC
> > accordingly. This feature selection could occur before adding the
> > features to the cache so only the relevant features of the would be
> > cached. I would not rely on the resultcache approach in this case,
> > actually my internal cache would be the resultcache.
>
> That's what I figured. So the resultcache and sub-structures would cease
> to exist correct? This will result in a fairly big change in the MapScript API
> wouldn't it?
>

This proposal is an addition not a replacement. Each of the layers
(the inline layers as well) can be queried using the existing way and
the resultcache is filled accordingly. We could continue to use the
drawquery appoach as it stands. However the features referenced in the
resultcache might also served from the cache of the inline layer to
eliminate the need of the two phased datasource access.
In addition using this new approach a brand new layer can also be
definied to display the relevant features and therefore every kind of
the symbology that a layer provides can be used to

highlight those features.
I don't think we should have to change the mapscript API too much.
Only the collection of the inner layers should be available in the
API.
But indeed the way of thinking when solving the problems might also be
changed so as to utilize the new capabilities. This approach is rather
declarative than procedural, in this regard it can be more obvious to
describe a data processing sequence. Setting up a right configuration
should be enough instead of writing much of code. I guess it's quite
benefical for the CGI based applications to solve complex issues. The
only issue is to find a convenient way to modify that configuration
using the URL parameters

> > If we would want to retain more features in the cache (so that the
> > user would pick up the items one by one in some later queries) we
> > should nest 2 inline layers. The inner layer would retain all of the
> > features that the outer layer would get one by one later.
> > I would implement the following selection modes (that would cover most
> > of the use cases we have)
> > 1. querybyattributes
> > 2. querybylayer
> > 3. querybypreviousquery (using the features in the cache as the
> > selection features)
>
> Add a querybyshape and that about covers it.

I consider querybyshape is a special case of querybylayer when the
query layer has one shape. That layer can be another inline layer with
that shape added externally.
However I won't object adding a querybyshape as well. We can possibly
set the wkt of that shape in the PROCESSING definition.

> >
> > Yes, that's one of the biggest questions I have. Actually I would
> > retrieve all of the items, the WhichItems option seem to be a bit
> > problematic, since it's not too obvious which layers should
> > participate in the WhichItems operation. Presumably the layer which is
> > involved in the rendering.
>
> Really, msLayerWhichItems only exists because of limitations in ESRI C
> API at the time we generalized things. They didn't support (or I couldn't
> figure out) how to do a "select *". I would support having a feature always
> be the same regardless of query or drawing. In the long run that does
> simplify life. I'm not quite sure of the ramifications of this though. One thing
> for sure would be that in cases where the code needs to reference an item
> value (e.g. CLASSITEM) we normally do that by index (hence the classitemindex
> member). The code would have to be changed to populate those values
> as processing starts, e.g.:
>
> if(lp->classitem && lp->classitemindex == -1) msGetItemIndex(lp->items, lp->classitem);
>
> With attribute binding that's easy to do. Logical expressions get a bit hairy but
> it is doable. Regardless, I think this RFC should cover these types of changes.
>

That's right I'll take an update on the RFC in some days. One other
drawback of this solution is the additional memory requirement of the
items but it's not so fundamental I think.
In the future we can possibly do some enhancements by adding the
option to retrieve only the relevant items of the outer layer (which
participates in the drawing).

>
> >> Do you see the cache being tunable? For example, trimming features if they
> >> have not been accessed in a while.
> > No at this time.
>
> Does this approach limit possibilities in the future?
>

No. But this one requires a different caching provider which retains
shapes of the subsequent queries. Recall that I only addressed to deal
with the 2 pass query problem this time by retaining the shapes of the
last query (or adding the last query to the existing items). In the
current approach the user would know exactly what shapes the cache
contains. This cache is considered as a result cache.
In some further releases we can implement that kind of caching
provider side by side. That provider should be placed close to the
original data provider.
 I can imagine this provider to be implemented in a tile based approach like:
1. The shapes within the tiles relevant to the query extent are
retrieved from the data source if those shapes haven't already been in
the cache.
2. The tile extents are stored in a linked list, the hash keys of the
shapes referenced by the tiles are also stored in this list
3. One shape might be involved in multiple tiles so a refcount in the
hashtable along with the shapes are also stored.
4. The time of the last WhichShapes or GetShape is also stored along
with the shapes. The shapes of the tiles not retrieved for a long time
are cleared.
5. To prevent from the shape duplication during the NextShape
operation the shapes with the same time information are skipped.

>
> I love the possibilities but I worry a bit about getting users to understand it. I definitely think
> we should separate parameters from metadata even if it's just another named hash block.
>
A good documentation with use cases would help in this. We should keep
the overall conception consistent so use one approach to specify the
the provider related parameters. I've no problem with the PROCESSING
approach so that the provider would store the preprocessed values in a
layerinfo structure during the LayerOpen call.


Best regards,

Tamas



More information about the mapserver-dev mailing list