MS RFC 22a: Feature cache for long running processes and query processing (C

Wed Jun 13 10:41:13 EDT 2007

More comments inline...

>>> On 6/12/2007 at 11:58 AM, in message
<f3b73b7d0706120958m7ddbab7ah97054f1ea8d8f718 at mail.gmail.com>, "Tamas Szekeres"
<szekerest at gmail.com> wrote:
> Steve,
> 
> Thanks for the interest. There are a lot of directions and
> alternatives in my mind and any feedback helps to choose among the
> possible variations. And it's quite useful to see whether to deal with
> this problem in this way at all, or not.
> 
>>
>> With a nested layer, personally I would just re-use the LAYER keyword. That 
>> might open up deeper nesting (e.g. raw feature => simplify => buffer) and save 
>> us a keyword.
>>
> 
> I agree, absolutely. In addition we should allow to specify multiple
> nested layers, because a provider might have to use multiple layers as
> source in some operations. These nested layers might be represented as
> arrays of layers.
> 
>> I would support re-doing the current inline features using whatever structure 
>> is used for a feature cache. We only need one way of storing a collection of 
>> features.
>>
> Moreover I don't see too much difference between the existing inline
> layer provider and my proposed provider. My provider would accept
> features from layers in addition to allow adding the 
> features externally to the cache. I would suggest to extend the
> existing implementation instead of creating a duplication. I would
> like to see if the "features" and "currentfeature" be made  internal
> to the provider (by placing them into the layerinfo). In addition,
> theoretically, a new member should be added to the layervtable
> (AddFeature) for adding the features to the provider cache of the
> inline layer.
> 
>> Questions:
>>
>> I'd like some specifics on how you'd implement, for example, a buffer 
>> provider. I assume that would use the nested layer approach with the inner 
> >layer providing features to a buffer process.
> 
>> How would do you see configuration of a capability like that? For example, a 
>> typical buffer operation would have a distance, perhaps units and maybe a 
>> corner tolerance.
> 
> Any provider specific information would go into the metadata section
> of that layer. So in this case we could use the following definition
> example:
> 
> LAYER
>     CONNECTIONTYPE BUFFEROPERATION
>     METADATA
>         "distance"  "23"
>         "units"   "meters"
>         ....
>     END
>     LAYER
>         CONNECTION "any other layer"
>     END
> END
> 
> I would also allow to specify the operation parameters as feature items, 
> like
> METADATA
>     "distance"  "[distanceitem]"
> END

I don't think I like attribute binding here. For one thing you don't know what type
distance is.

> The most trivial implementation of this buffer provider is to allow
> only the nested layer specification and propagate every vtable
> functions to the inner layer. The operation should be applied on the
> features retrieved by the NextShape and GetShape calls respectively.

Ok, makes sense.

>>
>> How would the core know when to use the cache and when to populate it? It 
>> wasn't immediately obvious to me although I probably just missed it. 
>> Presumably the cache (for long running
>> processes) would be built up over time. How is that managed? I mean,
>> different users are working off the same feature cache so I could
>> imagine two requests close to one another with a gap
>> in the middle. How do those missing features get into the cache efficiently?
>>
> 
> At this time I would consider this provider as an inline layer and
> would be capable to preserve the features of a previous query by
> either:
> 1. retrieving the features from the given source using a specific
> search option (add the features to the cache).
> 2. serving the features from the cache (disconnected mode)
> 
> these options might be controlled by some metadata values.
> The LayerWhichShapes is the function that would trigger a reload in
> the cache if needed. Originally I proposed the reload when the search
> rect changes.
> 
>> Since one-pass query support is a desired outcome I think the RFC should 
>> address that specifically. That is, what modifications would need to be made 
>> to a query routine (e.g.
>> msQueryUsingShape), the template output code and the query map code.
>> We, at least I, need to know how this will manifest itself with
>> regards to queries.
> 
> It seems I cannot avoid implementing the query capability in this
> provider so as to keep the implementation obvious and update this RFC
> accordingly. This feature selection could occur before adding the
> features to the cache so only the relevant features of the would be
> cached. I would not rely on the resultcache approach in this case,
> actually my internal cache would be the resultcache.

That's what I figured. So the resultcache and sub-structures would cease
to exist correct? This will result in a fairly big change in the MapScript API
wouldn't it?

> If we would want to retain more features in the cache (so that the
> user would pick up the items one by one in some later queries) we
> should nest 2 inline layers. The inner layer would retain all of the
> features that the outer layer would get one by one later.
> I would implement the following selection modes (that would cover most
> of the use cases we have)
> 1. querybyattributes
> 2. querybylayer
> 3. querybypreviousquery (using the features in the cache as the
> selection features)

Add a querybyshape and that about covers it.

> All of these options might be specified simultaneously and would be
> applied when populating the cache
> 
>> Another query question. Right now the code requests two different versions 
>> of a feature in the two-stage system. The first contains just enough 
>> information to confirm membership in the
>> result set (typically enough attributes to evaluate a class expression
>> - rendering does this too). The second, via msLayerGetShape, retrieves
>> all attributes for presentation. Presumably you
>> wouldn't want to bother with that first step which eliminate the need
>> for msLayerWhichItems as it sits but that will require something else
>> in its place. Might want to consider losing
>> msLayerWhichItems altogether, that is, get all items all the time.
>> That would require just-in-time item -> item index determination.
>> Doable, but it impacts a lot of code. I'm curious what
>> authors of various providers think. For shapefiles we grab 'em all
>> anyway so no biggie. I suppose for PostGIS and Oracle Spatial users
>> can explicitly choose what they want in sub-selects to
>> avoid returning everything. Not sure about SDE.
> 
> Yes, that's one of the biggest questions I have. Actually I would
> retrieve all of the items, the WhichItems option seem to be a bit
> problematic, since it's not too obvious which layers should
> participate in the WhichItems operation. Presumably the layer which is
> involved in the rendering.

Really, msLayerWhichItems only exists because of limitations in ESRI C
API at the time we generalized things. They didn't support (or I couldn't 
figure out) how to do a "select *". I would support having a feature always
be the same regardless of query or drawing. In the long run that does
simplify life. I'm not quite sure of the ramifications of this though. One thing
for sure would be that in cases where the code needs to reference an item
value (e.g. CLASSITEM) we normally do that by index (hence the classitemindex
member). The code would have to be changed to populate those values 
as processing starts, e.g.:

if(lp->classitem && lp->classitemindex == -1) msGetItemIndex(lp->items, lp->classitem);

With attribute binding that's easy to do. Logical expressions get a bit hairy but
it is doable. Regardless, I think this RFC should cover these types of changes.

> In addition there are some further implementation issues about the
> various vtable functions. For example the GetAutoStyle method would
> set up the styling of the inner layer that should be copied back to
> the outer.
> 
>>
>> I can't see requiring users to define a cache layer (unless you'd fall back 
>> on two-pass system?) explicitly. Seems like there should be some layer magic 
>> behind the scenes (as with embedable scalebars).
>>
> Since I've proposed a new data provider it should belong to a layer,
> definitely. Since I'm thinking of a more general approach to establish
> a complex data processing chain I would not be much interested in a
> cache only solution. At least the user should configure the operation
> somehow which would require some additional configuration. We could
> probably hook into the vtable of the original layer (as my former
> solution of RFC 22) but it seemed to be complicated a bit ;-).

I'm just thinking for something like a query it would be user friendly not to require
complicated mapfile configuration to support clicking on a feature.

>> Do you see the cache being tunable? For example, trimming features if they 
>> have not been accessed in a while.
> No at this time.

Does this approach limit possibilities in the future?

>>
>> I'm curious about a particular use case that I know folks are looking for 
>> support with. User identifies a parcel (by attribute, parcel id or by 
>> clicking on it), the selected feature is buffered (after
>> selection), and finally it is used to query the parcel data to select
>> all intersecting parcels (to generate a mailing list). How might
> something like that benefit from this approach?
> 
> Here is my suggestion in this case:
> 
> LAYER
>     NAME "mailing list"
>     CONNECTION "parcels"
>     METADATA
>         "querylayerindex"  "0"  #points to the inner layer
>         "geometryop" "overlap"
>     END
>     LAYER
>         CONNECTIONTYPE BUFFEROPERATION
>         METADATA
>             "distance" "23"
>             "units" "meters"
>         END
>         LAYER
>             CONNECTION "parcels"
>             METADATA
>                 "queryitem" "parecelid"
>                 "querystring" "23"
>             END
>         END
>     END
> END
> LAYER
>     NAME "parcels"
>     CONNECTIONTYPE POSTGIS
>     ...
> END
> 
> If you would query by points instead of attributes you would probably
> have to set an additional inline layer and add the selection shape to
> it
> 
> LAYER
>     NAME "mailing list"
>     CONNECTION "parcels"
>     METADATA
>         "querylayerindex"  "0"  #points to the inner layer
>         "geometryop" "overlap"
>     END
>     LAYER
>         CONNECTIONTYPE BUFFEROPERATION
>         METADATA
>             "distance" "23"
>             "units" "meters"
>         END
>         LAYER
>             CONNECTION "parcels"
>             METADATA
>                 "querylayerindex" "0"
>                 "geometryop" "overlap"
>             END
>             LAYER
>                  FEATURE
>                       .....
>                  END
>             END
>         END
>     END
> END
> LAYER
>     NAME "parcels"
>     CONNECTIONTYPE POSTGIS
>     ...
> END
> 
> 
> Since I'm thinking of inline layers to implement the cache there's no
> need to add CONNECTIONTYPE for these layers.

I love the possibilities but I worry a bit about getting users to understand it. I definitely think
we should separate parameters from metadata even if it's just another named hash block.

> Best regards,
> 
> Tamas