MS RFC 22a: Feature cache for long running processes and
query processing (C
Steve Lime
Steve.Lime at DNR.STATE.MN.US
Tue Jun 12 10:57:27 EDT 2007
I would hope that msLayerGetShape isn't opening a new connection for each feature... msLayerOpen
should do that. It's still expensive to execute a bit of SQL for each feature though.
Steve
>>> On 6/12/2007 at 7:54 AM, in message <466E976E.10006 at univali.br>, Fernando Simon
<fsimon at univali.br> wrote:
> Hi Steve and Tamas,
> This issue is a good improvement for Mapserver.
> For Oracle Spatial (and I believe that for PostGIS too) the big
> problem is the cost for login/logoff for every feature. This cost appear
> when the feature is request one by one, the login/logoff is done for
> every one.
> I believe that a good improvement, and the first step (maybe for
> 5.0), is a way to retrieve all the features instead one by one, as Tamas
> suggested. Using this way just one connection is needed to retrieve the
> features from database, I don't know for shapefiles but for databases
> will be very good. Daniel, did you remember the example that I show to
> you when you came for user meeting in Brazil (in 2005)?
> As Steve expose some details need to be checked before start to
> develop, I have just few questions. The size of cache, any limit,
> numbers of columns? How many time, or until what, the cache will stay. I
> ask this because some problems can appear with the Mapscript's objects
> that can saty alive in memory (like JavaMapscript and PHP/FastCGI mode).
> For me, will be a good improvement for entire Mapserver.
> Best regards.
>
> Fernando Simon
>
> Steve Lime wrote:
>> Hi Tamas: Just took some time to read through this. I like it in general as
> it feels very MapServer-ish (hooking things together to do larger things). I
> do have some comments/questions.
>>
>> Comments:
>>
>> Linking layers has been done before (e.g. TILELAYERs) so it can work well.
> In that case we control visibility/queryability using the TYPE parameter.
> Might consider another TYPE to hide linked layers from certain types of
> processing.
>>
>> With a nested layer, personally I would just re-use the LAYER keyword. That
> might open up deeper nesting (e.g. raw feature => simplify => buffer) and save
> us a keyword.
>>
>> I would support re-doing the current inline features using whatever structure
> is used for a feature cache. We only need one way of storing a collection of
> features.
>>
>> Questions:
>>
>> I'd like some specifics on how you'd implement, for example, a buffer
> provider. I assume that would use the nested layer approach with the inner
> layer providing features to a buffer process. How would do you see
> configuration of a capability like that? For example, a typical buffer
> operation would have a distance, perhaps units and maybe a corner tolerance.
>>
>> How would the core know when to use the cache and when to populate it? It
> wasn't immediately obvious to me although I probably just missed it.
> Presumably the cache (for long running processes) would be built up over
> time. How is that managed? I mean, different users are working off the same
> feature cache so I could imagine two requests close to one another with a gap
> in the middle. How do those missing features get into the cache efficiently?
>>
>> Since one-pass query support is a desired outcome I think the RFC should
> address that specifically. That is, what modifications would need to be made
> to a query routine (e.g. msQueryUsingShape), the template output code and the
> query map code. We, at least I, need to know how this will manifest itself
> with regards to queries.
>>
>> Another query question. Right now the code requests two different versions
> of a feature in the two-stage system. The first contains just enough
> information to confirm membership in the result set (typically enough
> attributes to evaluate a class expression - rendering does this too). The
> second, via msLayerGetShape, retrieves all attributes for presentation.
> Presumably you wouldn't want to bother with that first step which eliminate
> the need for msLayerWhichItems as it sits but that will require something
> else in its place. Might want to consider losing msLayerWhichItems
> altogether, that is, get all items all the time. That would require
> just-in-time item -> item index determination. Doable, but it impacts a lot
> of code. I'm curious what authors of various providers think. For shapefiles
> we grab 'em all anyway so no biggie. I suppose for PostGIS and Oracle Spatial
> users can explicitly choose what they want in sub-selects to avoid returning
> everything. Not sure about SDE.
>>
>> I can't see requiring users to define a cache layer (unless you'd fall back
> on two-pass system?) explicitly. Seems like there should be some layer magic
> behind the scenes (as with embedable scalebars).
>>
>> Do you see the cache being tunable? For example, trimming features if they
> have not been accessed in a while.
>>
>> I'm curious about a particular use case that I know folks are looking for
> support with. User identifies a parcel (by attribute, parcel id or by
> clicking on it), the selected feature is buffered (after selection), and
> finally it is used to query the parcel data to select all intersecting
> parcels (to generate a mailing list). How might something like that benefit
> from this approach?
>>
>> Steve
>>
>>
>>>>> Tamas Szekeres <szekerest at GMAIL.COM> 06/10/07 1:22 PM >>>
>>>>>
>> Developers,
>>
>> Currently the various query operations involve multiple access to the
>> data providers may cause a significant performance impact depending on
>> the providers. In the first phase all of the features in the given
>> search area are retrieved and the index of the relevant shapes are
>> stored in the result cache. In the second phase the features in the
>> result cache are retrieved form the provider one by one. Retaining the
>> shapes in the memory we could eliminate the need of the subsequent
>> access to the providers and increase the overall performance of the
>> query. Implementing the cache requires a transformation of the data
>> between the data provider and the client. From this aspect it is
>> desirable to provide a framework to implement this transformation in a
>> higher level of abstraction.
>>
>> I've modified the original concept to eliminate the need of
>> introducing a new category and structure type in the core mapserver.
>> Currenly the feature cache is being implemented as one additional data
>> provider which use another layer as the data source. This concept
>> involves linking the layers to each other.
>>
>> This RFC intends to describe the general concept which can easily be
>> turned into a full implementation with further error handling in time
>> for the 5.0 release if the concept gets the required support to carry
>> on.
>>
>> Any comments or ideas are highly appreciated.
>>
>> Best regards,
>>
>> Tamas
>>
>>
More information about the mapserver-dev
mailing list