MS RFC 22a: Feature cache for long running processes and
query processing (Call for comments)
Tamas Szekeres
szekerest at GMAIL.COM
Tue Jun 12 12:58:17 EDT 2007
Steve,
Thanks for the interest. There are a lot of directions and
alternatives in my mind and any feedback helps to choose among the
possible variations. And it's quite useful to see whether to deal with
this problem in this way at all, or not.
>
> With a nested layer, personally I would just re-use the LAYER keyword. That might open up deeper nesting (e.g. raw feature => simplify => buffer) and save us a keyword.
>
I agree, absolutely. In addition we should allow to specify multiple
nested layers, because a provider might have to use multiple layers as
source in some operations. These nested layers might be represented as
arrays of layers.
> I would support re-doing the current inline features using whatever structure is used for a feature cache. We only need one way of storing a collection of features.
>
Moreover I don't see too much difference between the existing inline
layer provider and my proposed provider. My provider would accept
features from layers in addition to allow adding the
features externally to the cache. I would suggest to extend the
existing implementation instead of creating a duplication. I would
like to see if the "features" and "currentfeature" be made internal
to the provider (by placing them into the layerinfo). In addition,
theoretically, a new member should be added to the layervtable
(AddFeature) for adding the features to the provider cache of the
inline layer.
> Questions:
>
> I'd like some specifics on how you'd implement, for example, a buffer provider. I assume that would use the nested layer approach with the inner layer providing features to a buffer process.
> How would do you see configuration of a capability like that? For example, a typical buffer operation would have a distance, perhaps units and maybe a corner tolerance.
Any provider specific information would go into the metadata section
of that layer. So in this case we could use the following definition
example:
LAYER
CONNECTIONTYPE BUFFEROPERATION
METADATA
"distance" "23"
"units" "meters"
....
END
LAYER
CONNECTION "any other layer"
END
END
I would also allow to specify the operation parameters as feature items, like
METADATA
"distance" "[distanceitem]"
END
The most trivial implementation of this buffer provider is to allow
only the nested layer specification and propagate every vtable
functions to the inner layer. The operation should be applied on the
features retrieved by the NextShape and GetShape calls respectively.
>
> How would the core know when to use the cache and when to populate it? It wasn't immediately obvious to me although I probably just missed it. Presumably the cache (for long running
processes) would be built up over time. How is that managed? I mean,
different users are working off the same feature cache so I could
imagine two requests close to one another with a gap
in the middle. How do those missing features get into the cache efficiently?
>
At this time I would consider this provider as an inline layer and
would be capable to preserve the features of a previous query by
either:
1. retrieving the features from the given source using a specific
search option (add the features to the cache).
2. serving the features from the cache (disconnected mode)
these options might be controlled by some metadata values.
The LayerWhichShapes is the function that would trigger a reload in
the cache if needed. Originally I proposed the reload when the search
rect changes.
> Since one-pass query support is a desired outcome I think the RFC should address that specifically. That is, what modifications would need to be made to a query routine (e.g.
msQueryUsingShape), the template output code and the query map code.
We, at least I, need to know how this will manifest itself with
regards to queries.
>
It seems I cannot avoid implementing the query capability in this
provider so as to keep the implementation obvious and update this RFC
accordingly. This feature selection could occur before adding the
features to the cache so only the relevant features of the would be
cached. I would not rely on the resultcache approach in this case,
actually my internal cache would be the resultcache.
If we would want to retain more features in the cache (so that the
user would pick up the items one by one in some later queries) we
should nest 2 inline layers. The inner layer would retain all of the
features that the outer layer would get one by one later.
I would implement the following selection modes (that would cover most
of the use cases we have)
1. querybyattributes
2. querybylayer
3. querybypreviousquery (using the features in the cache as the
selection features)
All of these options might be specified simultaneously and would be
applied when populating the cache
> Another query question. Right now the code requests two different versions of a feature in the two-stage system. The first contains just enough information to confirm membership in the
result set (typically enough attributes to evaluate a class expression
- rendering does this too). The second, via msLayerGetShape, retrieves
all attributes for presentation. Presumably you
wouldn't want to bother with that first step which eliminate the need
for msLayerWhichItems as it sits but that will require something else
in its place. Might want to consider losing
msLayerWhichItems altogether, that is, get all items all the time.
That would require just-in-time item -> item index determination.
Doable, but it impacts a lot of code. I'm curious what
authors of various providers think. For shapefiles we grab 'em all
anyway so no biggie. I suppose for PostGIS and Oracle Spatial users
can explicitly choose what they want in sub-selects to
avoid returning everything. Not sure about SDE.
Yes, that's one of the biggest questions I have. Actually I would
retrieve all of the items, the WhichItems option seem to be a bit
problematic, since it's not too obvious which layers should
participate in the WhichItems operation. Presumably the layer which is
involved in the rendering.
In addition there are some further implementation issues about the
various vtable functions. For example the GetAutoStyle method would
set up the styling of the inner layer that should be copied back to
the outer.
>
> I can't see requiring users to define a cache layer (unless you'd fall back on two-pass system?) explicitly. Seems like there should be some layer magic behind the scenes (as with
embedable scalebars).
>
Since I've proposed a new data provider it should belong to a layer,
definitely. Since I'm thinking of a more general approach to establish
a complex data processing chain I would not be much interested in a
cache only solution. At least the user should configure the operation
somehow which would require some additional configuration. We could
probably hook into the vtable of the original layer (as my former
solution of RFC 22) but it seemed to be complicated a bit ;-).
> Do you see the cache being tunable? For example, trimming features if they have not been accessed in a while.
No at this time.
>
> I'm curious about a particular use case that I know folks are looking for support with. User identifies a parcel (by attribute, parcel id or by clicking on it), the selected feature is buffered (after
selection), and finally it is used to query the parcel data to select
all intersecting parcels (to generate a mailing list). How might
something like that benefit from this approach?
>
Here is my suggestion in this case:
LAYER
NAME "mailing list"
CONNECTION "parcels"
METADATA
"querylayerindex" "0" #points to the inner layer
"geometryop" "overlap"
END
LAYER
CONNECTIONTYPE BUFFEROPERATION
METADATA
"distance" "23"
"units" "meters"
END
LAYER
CONNECTION "parcels"
METADATA
"queryitem" "parecelid"
"querystring" "23"
END
END
END
END
LAYER
NAME "parcels"
CONNECTIONTYPE POSTGIS
...
END
If you would query by points instead of attributes you would probably
have to set an additional inline layer and add the selection shape to
it
LAYER
NAME "mailing list"
CONNECTION "parcels"
METADATA
"querylayerindex" "0" #points to the inner layer
"geometryop" "overlap"
END
LAYER
CONNECTIONTYPE BUFFEROPERATION
METADATA
"distance" "23"
"units" "meters"
END
LAYER
CONNECTION "parcels"
METADATA
"querylayerindex" "0"
"geometryop" "overlap"
END
LAYER
FEATURE
.....
END
END
END
END
END
LAYER
NAME "parcels"
CONNECTIONTYPE POSTGIS
...
END
Since I'm thinking of inline layers to implement the cache there's no
need to add CONNECTIONTYPE for these layers.
Best regards,
Tamas
More information about the mapserver-dev
mailing list