MS RFC 22a: Feature cache for long running processes and query processing (C

Tue Jun 12 10:57:27 EDT 2007

I would hope that msLayerGetShape isn't opening a new connection for each feature... msLayerOpen
should do that. It's still expensive to execute a bit of SQL for each feature though.

Steve

>>> On 6/12/2007 at 7:54 AM, in message <466E976E.10006 at univali.br>, Fernando Simon
<fsimon at univali.br> wrote:
> Hi Steve and Tamas,
>     This issue is a good improvement for Mapserver.
>     For Oracle Spatial (and I believe that for PostGIS too) the big 
> problem is the cost for login/logoff for every feature. This cost appear 
> when the feature is request one by one, the login/logoff is done for 
> every one.
>     I believe that a good improvement, and the first step (maybe for 
> 5.0), is a way to retrieve all the features instead one by one, as Tamas 
> suggested. Using this way just one connection is needed to retrieve the 
> features from database, I don't know for shapefiles but for databases 
> will be very good. Daniel, did you remember the example that I show to 
> you when you came for user meeting in Brazil (in 2005)?
>     As Steve expose some details need to be checked before start to 
> develop, I have just few questions. The size of cache, any limit, 
> numbers of columns? How many time, or until what, the cache will stay. I 
> ask this because some problems can appear with the Mapscript's objects 
> that can saty alive in memory (like JavaMapscript and PHP/FastCGI mode).
>     For me, will be a good improvement for entire Mapserver.
>     Best regards.
> 
> Fernando Simon
> 
> Steve Lime wrote:
>> Hi Tamas: Just took some time to read through this. I like it in general as 
> it feels very MapServer-ish (hooking things together to do larger things). I 
> do have some comments/questions.
>>
>> Comments:
>>
>> Linking layers has been done before (e.g. TILELAYERs) so it can work well. 
> In that case we control visibility/queryability using the TYPE parameter. 
> Might consider another TYPE to hide linked layers from certain types of 
> processing.
>>
>> With a nested layer, personally I would just re-use the LAYER keyword. That 
> might open up deeper nesting (e.g. raw feature => simplify => buffer) and save 
> us a keyword.
>>
>> I would support re-doing the current inline features using whatever structure 
> is used for a feature cache. We only need one way of storing a collection of 
> features.
>>
>> Questions:
>>
>> I'd like some specifics on how you'd implement, for example, a buffer 
> provider. I assume that would use the nested layer approach with the inner 
> layer providing features to a buffer process. How would do you see 
> configuration of a capability like that? For example, a typical buffer 
> operation would have a distance, perhaps units and maybe a corner tolerance.
>>
>> How would the core know when to use the cache and when to populate it? It 
> wasn't immediately obvious to me although I probably just missed it. 
> Presumably the cache (for long running processes) would be built up over 
> time. How is that managed? I mean, different users are working off the same 
> feature cache so I could imagine two requests close to one another with a gap 
> in the middle. How do those missing features get into the cache efficiently?
>>
>> Since one-pass query support is a desired outcome I think the RFC should 
> address that specifically. That is, what modifications would need to be made 
> to a query routine (e.g. msQueryUsingShape), the template output code and the 
> query map code. We, at least I, need to know how this will manifest itself 
> with regards to queries.
>>
>> Another query question. Right now the code requests two different versions 
> of a feature in the two-stage system. The first contains just enough 
> information to confirm membership in the result set (typically enough 
> attributes to evaluate a class expression - rendering does this too). The 
> second, via msLayerGetShape, retrieves all attributes for presentation. 
> Presumably you wouldn't want to bother with that first step which eliminate 
> the need for msLayerWhichItems as it sits but that will require something 
> else in its place. Might want to consider losing msLayerWhichItems 
> altogether, that is, get all items all the time. That would require 
> just-in-time item -> item index determination. Doable, but it impacts a lot 
> of code. I'm curious what authors of various providers think. For shapefiles 
> we grab 'em all anyway so no biggie. I suppose for PostGIS and Oracle Spatial 
> users can explicitly choose what they want in sub-selects to avoid returning 
> everything. Not sure about SDE.
>>
>> I can't see requiring users to define a cache layer (unless you'd fall back 
> on two-pass system?) explicitly. Seems like there should be some layer magic 
> behind the scenes (as with embedable scalebars).
>>
>> Do you see the cache being tunable? For example, trimming features if they 
> have not been accessed in a while.
>>
>> I'm curious about a particular use case that I know folks are looking for 
> support with. User identifies a parcel (by attribute, parcel id or by 
> clicking on it), the selected feature is buffered (after selection), and 
> finally it is used to query the parcel data to select all intersecting 
> parcels (to generate a mailing list). How might something like that benefit 
> from this approach?
>>
>> Steve
>>
>>   
>>>>> Tamas Szekeres <szekerest at GMAIL.COM> 06/10/07 1:22 PM >>>
>>>>>         
>> Developers,
>>
>> Currently the various query operations involve multiple access to the
>> data providers may cause a significant performance impact depending on
>> the providers. In the first phase all of the features in the given
>> search area are retrieved and the index of the relevant shapes are
>> stored in the result cache. In the second phase the features in the
>> result cache are retrieved form the provider one by one. Retaining the
>> shapes in the memory we could eliminate the need of the subsequent
>> access to the providers and increase the overall performance of the
>> query. Implementing the cache requires a transformation of the data
>> between the data provider and the client. From this aspect it is
>> desirable to provide a framework to implement this transformation in a
>> higher level of abstraction.
>>
>> I've modified the original concept to eliminate the need of
>> introducing a new category and structure type in the core mapserver.
>> Currenly the feature cache is being implemented as one additional data
>> provider which use another layer as the data source. This concept
>> involves linking the layers to each other.
>>
>> This RFC intends to describe the general concept which can easily be
>> turned into a full implementation with further error handling in time
>> for the 5.0 release if the concept gets the required support to carry
>> on.
>>
>> Any comments or ideas are highly appreciated.
>>
>> Best regards,
>>
>> Tamas
>>
>>