MS RFC 22a: Feature cache for long running processes and query processing (update)

Thu Jun 28 10:29:45 EDT 2007

I think the easiest way to achieve one-pass queries would be to simply
re-tool the result set structure to store a pile of features (linked list or other) in
memory. The id cache could be maintained for memory constrained situations.

That doesn't get us past the other 80% issues, but might it simplify the implementation
of the rest and avoid user confusion with having to set up alternative layers and
such?

Regarding the direct users to MapGuide comment, remember that all sorts of geo-processing
capabilities are already available through MapScript. You can buffer, intersect and union
features until the cows come home already. You just have to do it in code so I fear we're 
already dangerously close to a GIS. ;-)

Steve

>>> On 6/27/2007 at 11:14 PM, in message <468335B8.7030600 at mapgears.com>, Daniel
Morissette <dmorissette at MAPGEARS.COM> wrote:
> Hi Tamas,
> 
> Tamas Szekeres wrote:
>> 
>> I personally don't favour the "average user" terminology in this area.
>> I consider the
>> users as talented developers who spare no trouble to read some 
>> documentation
>> than might help to solve that particular problem they have. 
> 
> 
> Life would be so much easier for us if all our users were of the type 
> you describe here. Unfortunately my experience differs from yours: in my 
> experience I have seen all sorts of users, some beginners, some 
> experienced, some who read docs and some who don't. I have also learned 
> that even the talented developers prefer when the software behaves in a 
> natural way, does "the right thing" by default, and when the interface 
> to use a feature is simple.
> 
> That being said, if what RFC-22a proposes is the simplest possible 
> solution for the double-pass query then so be it (at least it's a 
> solution), but you can be assured that this approach will prompt all 
> sorts of questions from all sorts of users, especially with respect to 
> the way it solves the double-pass query issue which is my main concern 
> in this discussion.
> 
> The other filtering and processing features sound nice and you're right 
> that the proposed approach is a very neat solution to those problems, 
> but they are new problems to me and my main focus in all this is still 
> this old double-pass query issue.
> 
> 
>> 
>> Or alternatively we could focus only on the 2 pass query problem by not
>> utilizing the vtable. It will possibly require either to modify all of
>> the mapserver
>> code involved in the query operations or modify all of the providers 
>> suffering
>> from this particular problem. This might require a large amount of changes
>> in the existing code and would solve at most 20% of the problems I've
>> addressed.
>> 
> 
> Well, that 20% (the double-pass query) is the one that keeps coming back 
> every once in a while. The other 80% are bonus features for which there 
> has been very little demand so far.
> 
> Let's keep in mind that "MapServer is not a full-featured GIS system, 
> nor does it aspire to be". Transforming features on the fly is nice, but 
> that kind of processing has never been MapServer's focus. I believe 
> other tools such as PostGIS support this kind of operations and I always 
> had a preference for letting them offer those features and letting 
> MapServer concentrate on what it does best: publish maps on the web.
> 
> I think the users who need a Web GIS should be looking more at MapGuide 
> than MapServer.
> 
>> 
>> With the current implementation the lookup will happen among
>> the 1000 shapes since the subsequent query extent will fall inside the
>> previous one. However with some line of code we could create an additional
>> option to reconstruct the cache in every WhichShapes call.
>> 
>> However because the hashsize have been set to 512 I don't think we
>> have to do much of the sequential scans. If the shapes are spread steadily
>> across the array on the hashtable we'll have to skip on the average
>> of 1.5 items in effect. That's possibly outperform the necessary disk
>> accesses spatial index lookups and shape creations.
>> 
> 
> Let's say that the first WhichShapes call loads 1,000 shapes, and then I 
> do a query by point on that layer. Since there is no spatial index in 
> memory, all the shapes in the cache will have to be accessed to identify 
> the ones that are within tolerance of the query location. Sure, looking 
> up the bounds of 1000 shapes is not a huge cost, but it's a cost, on top 
> of all the memory used to cache all those shapes.
> 
> OTOH, if the data provider supports a spatial index it can find the 
> matching shapes (2 or 3 shapes in general) with very little work using 
> its spatial index, removing any benefit of caching and without the cost 
> of all the memory used to cache features.
> 
> Of course if I render the same map area 20 times in a persistent process 
> then I will benefit from the cache, but I never wrote any MapServer 
> application that does that. The typical application renders a map once 
> and then moves to a new area or zooms in a separate request which does 
> not benefit from caching, so there is little benefit to caching when 
> rendering a map.
> 
> OTOH there would be real benefits to caching the first pass of a 
> double-pass query since we are assured that we'll read the shapes twice 
> in this case, and there are usually very few shapes to cache. Thinking 
> about it some more I think I'd like to see a mode of operation of the 
> cache that only caches queries.
> 
>> 
>> This is at least one option to use but not compulsory to use.
>> We can possibly offer that or continue to leave the users alone with
>> this problem.
>> 
> 
> True. At least you have made the effort of trying to find a solution to 
> the problem (and I have not).
> 
>>>
>>> - What are the implications of nesting layers on WMS services? I think
>>> users will (naively?) expect that the hierarchy of layers will be
>>> reflected in the WMS GetCapabilities, but I don't think that this is
>>> desirable. This may very well become a FAQ: "Why is the hierarchy of
>>> layers in my mapfile not reflected in WMS GetCapabilities?"
>>>
>> 
>> Only the root layers participate in the renderings (which are added to the
>> layers collection of the map), so there's no need to alter the current
>> approach. The nested layers will only behave as data sources for the outer
>> layer providers (like the shapefiles or spatial data tables etc. for
>> the existing
>> providers)
>> 
> 
> I agree that we should not alter the current approach, it would not make 
> sense to do that, but be prepared to answer questions from users asking 
> why the hierarchy of layers in a mapfile is not used in a WMS 
> GetCapabilities.
> 
> 
>> 
>>> and even if it has a NextItem method
>>> to walk through all objects, the order of objects is not maintained by a
>>> hashtable, so if a user has data sorted (by sortshp) then the sort oder
>>> will be lost and rendering order will become pseudo-random if done via a
>>> cache layer (unless I'm missing something?).
>>>
>> 
>> That's true. I'm not aware of the order of the renderings in this case.
>> In my practice I haven't found such a problem it was required.
>> However we could use an additional list to treat this issue if it is
>> significant.
>> 
> 
> This ordering of shapes at render time is a feature of MapServer, hence 
> the command-line program sortshp. I don't use it myself but some users 
> must rely on it otherwise it would not exist. I think it's a sad 
> side-effect to not try to maintain the ordering but I'll let those who 
> need this feature fight for it.
> 
> Daniel