MS RFC 22a: Feature cache for long running processes and query processing (Call for comments)

Tue Jun 12 08:54:06 EDT 2007

Hi Steve and Tamas,
    This issue is a good improvement for Mapserver.
    For Oracle Spatial (and I believe that for PostGIS too) the big 
problem is the cost for login/logoff for every feature. This cost appear 
when the feature is request one by one, the login/logoff is done for 
every one.
    I believe that a good improvement, and the first step (maybe for 
5.0), is a way to retrieve all the features instead one by one, as Tamas 
suggested. Using this way just one connection is needed to retrieve the 
features from database, I don't know for shapefiles but for databases 
will be very good. Daniel, did you remember the example that I show to 
you when you came for user meeting in Brazil (in 2005)?
    As Steve expose some details need to be checked before start to 
develop, I have just few questions. The size of cache, any limit, 
numbers of columns? How many time, or until what, the cache will stay. I 
ask this because some problems can appear with the Mapscript's objects 
that can saty alive in memory (like JavaMapscript and PHP/FastCGI mode).
    For me, will be a good improvement for entire Mapserver.
    Best regards.

Fernando Simon

Steve Lime wrote:
> Hi Tamas: Just took some time to read through this. I like it in general as it feels very MapServer-ish (hooking things together to do larger things). I do have some comments/questions.
>
> Comments:
>
> Linking layers has been done before (e.g. TILELAYERs) so it can work well. In that case we control visibility/queryability using the TYPE parameter. Might consider another TYPE to hide linked layers from certain types of processing.
>
> With a nested layer, personally I would just re-use the LAYER keyword. That might open up deeper nesting (e.g. raw feature => simplify => buffer) and save us a keyword.
>
> I would support re-doing the current inline features using whatever structure is used for a feature cache. We only need one way of storing a collection of features.
>
> Questions:
>
> I'd like some specifics on how you'd implement, for example, a buffer provider. I assume that would use the nested layer approach with the inner layer providing features to a buffer process. How would do you see configuration of a capability like that? For example, a typical buffer operation would have a distance, perhaps units and maybe a corner tolerance.
>
> How would the core know when to use the cache and when to populate it? It wasn't immediately obvious to me although I probably just missed it. Presumably the cache (for long running processes) would be built up over time. How is that managed? I mean, different users are working off the same feature cache so I could imagine two requests close to one another with a gap in the middle. How do those missing features get into the cache efficiently?
>
> Since one-pass query support is a desired outcome I think the RFC should address that specifically. That is, what modifications would need to be made to a query routine (e.g. msQueryUsingShape), the template output code and the query map code. We, at least I, need to know how this will manifest itself with regards to queries.
>
> Another query question. Right now the code requests two different versions of a feature in the two-stage system. The first contains just enough information to confirm membership in the result set (typically enough attributes to evaluate a class expression - rendering does this too). The second, via msLayerGetShape, retrieves all attributes for presentation. Presumably you wouldn't want to bother with that first step which eliminate the need for msLayerWhichItems as it sits but that will require something else in its place. Might want to consider losing msLayerWhichItems altogether, that is, get all items all the time. That would require just-in-time item -> item index determination. Doable, but it impacts a lot of code. I'm curious what authors of various providers think. For shapefiles we grab 'em all anyway so no biggie. I suppose for PostGIS and Oracle Spatial users can explicitly choose what they want in sub-selects to avoid returning everything. Not sure about SDE.
>
> I can't see requiring users to define a cache layer (unless you'd fall back on two-pass system?) explicitly. Seems like there should be some layer magic behind the scenes (as with embedable scalebars).
>
> Do you see the cache being tunable? For example, trimming features if they have not been accessed in a while.
>
> I'm curious about a particular use case that I know folks are looking for support with. User identifies a parcel (by attribute, parcel id or by clicking on it), the selected feature is buffered (after selection), and finally it is used to query the parcel data to select all intersecting parcels (to generate a mailing list). How might something like that benefit from this approach?
>
> Steve
>
>   
>>>> Tamas Szekeres <szekerest at GMAIL.COM> 06/10/07 1:22 PM >>>
>>>>         
> Developers,
>
> Currently the various query operations involve multiple access to the
> data providers may cause a significant performance impact depending on
> the providers. In the first phase all of the features in the given
> search area are retrieved and the index of the relevant shapes are
> stored in the result cache. In the second phase the features in the
> result cache are retrieved form the provider one by one. Retaining the
> shapes in the memory we could eliminate the need of the subsequent
> access to the providers and increase the overall performance of the
> query. Implementing the cache requires a transformation of the data
> between the data provider and the client. From this aspect it is
> desirable to provide a framework to implement this transformation in a
> higher level of abstraction.
>
> I've modified the original concept to eliminate the need of
> introducing a new category and structure type in the core mapserver.
> Currenly the feature cache is being implemented as one additional data
> provider which use another layer as the data source. This concept
> involves linking the layers to each other.
>
> This RFC intends to describe the general concept which can easily be
> turned into a full implementation with further error handling in time
> for the 5.0 release if the concept gets the required support to carry
> on.
>
> Any comments or ideas are highly appreciated.
>
> Best regards,
>
> Tamas
>
>