MS RFC 22a: Feature cache for long running processes and query processing (update)

Thu Jun 28 00:14:48 EDT 2007

Hi Tamas,

Tamas Szekeres wrote:
> 
> I personally don't favour the "average user" terminology in this area.
> I consider the
> users as talented developers who spare no trouble to read some 
> documentation
> than might help to solve that particular problem they have. 

Life would be so much easier for us if all our users were of the type 
you describe here. Unfortunately my experience differs from yours: in my 
experience I have seen all sorts of users, some beginners, some 
experienced, some who read docs and some who don't. I have also learned 
that even the talented developers prefer when the software behaves in a 
natural way, does "the right thing" by default, and when the interface 
to use a feature is simple.

That being said, if what RFC-22a proposes is the simplest possible 
solution for the double-pass query then so be it (at least it's a 
solution), but you can be assured that this approach will prompt all 
sorts of questions from all sorts of users, especially with respect to 
the way it solves the double-pass query issue which is my main concern 
in this discussion.

The other filtering and processing features sound nice and you're right 
that the proposed approach is a very neat solution to those problems, 
but they are new problems to me and my main focus in all this is still 
this old double-pass query issue.

> 
> Or alternatively we could focus only on the 2 pass query problem by not
> utilizing the vtable. It will possibly require either to modify all of
> the mapserver
> code involved in the query operations or modify all of the providers 
> suffering
> from this particular problem. This might require a large amount of changes
> in the existing code and would solve at most 20% of the problems I've
> addressed.
> 

Well, that 20% (the double-pass query) is the one that keeps coming back 
every once in a while. The other 80% are bonus features for which there 
has been very little demand so far.

Let's keep in mind that "MapServer is not a full-featured GIS system, 
nor does it aspire to be". Transforming features on the fly is nice, but 
that kind of processing has never been MapServer's focus. I believe 
other tools such as PostGIS support this kind of operations and I always 
had a preference for letting them offer those features and letting 
MapServer concentrate on what it does best: publish maps on the web.

I think the users who need a Web GIS should be looking more at MapGuide 
than MapServer.

> 
> With the current implementation the lookup will happen among
> the 1000 shapes since the subsequent query extent will fall inside the
> previous one. However with some line of code we could create an additional
> option to reconstruct the cache in every WhichShapes call.
> 
> However because the hashsize have been set to 512 I don't think we
> have to do much of the sequential scans. If the shapes are spread steadily
> across the array on the hashtable we'll have to skip on the average
> of 1.5 items in effect. That's possibly outperform the necessary disk
> accesses spatial index lookups and shape creations.
> 

Let's say that the first WhichShapes call loads 1,000 shapes, and then I 
do a query by point on that layer. Since there is no spatial index in 
memory, all the shapes in the cache will have to be accessed to identify 
the ones that are within tolerance of the query location. Sure, looking 
up the bounds of 1000 shapes is not a huge cost, but it's a cost, on top 
of all the memory used to cache all those shapes.

OTOH, if the data provider supports a spatial index it can find the 
matching shapes (2 or 3 shapes in general) with very little work using 
its spatial index, removing any benefit of caching and without the cost 
of all the memory used to cache features.

Of course if I render the same map area 20 times in a persistent process 
then I will benefit from the cache, but I never wrote any MapServer 
application that does that. The typical application renders a map once 
and then moves to a new area or zooms in a separate request which does 
not benefit from caching, so there is little benefit to caching when 
rendering a map.

OTOH there would be real benefits to caching the first pass of a 
double-pass query since we are assured that we'll read the shapes twice 
in this case, and there are usually very few shapes to cache. Thinking 
about it some more I think I'd like to see a mode of operation of the 
cache that only caches queries.

> 
> This is at least one option to use but not compulsory to use.
> We can possibly offer that or continue to leave the users alone with
> this problem.
> 

True. At least you have made the effort of trying to find a solution to 
the problem (and I have not).

>>
>> - What are the implications of nesting layers on WMS services? I think
>> users will (naively?) expect that the hierarchy of layers will be
>> reflected in the WMS GetCapabilities, but I don't think that this is
>> desirable. This may very well become a FAQ: "Why is the hierarchy of
>> layers in my mapfile not reflected in WMS GetCapabilities?"
>>
> 
> Only the root layers participate in the renderings (which are added to the
> layers collection of the map), so there's no need to alter the current
> approach. The nested layers will only behave as data sources for the outer
> layer providers (like the shapefiles or spatial data tables etc. for
> the existing
> providers)
> 

I agree that we should not alter the current approach, it would not make 
sense to do that, but be prepared to answer questions from users asking 
why the hierarchy of layers in a mapfile is not used in a WMS 
GetCapabilities.

> 
>> and even if it has a NextItem method
>> to walk through all objects, the order of objects is not maintained by a
>> hashtable, so if a user has data sorted (by sortshp) then the sort oder
>> will be lost and rendering order will become pseudo-random if done via a
>> cache layer (unless I'm missing something?).
>>
> 
> That's true. I'm not aware of the order of the renderings in this case.
> In my practice I haven't found such a problem it was required.
> However we could use an additional list to treat this issue if it is
> significant.
> 

This ordering of shapes at render time is a feature of MapServer, hence 
the command-line program sortshp. I don't use it myself but some users 
must rely on it otherwise it would not exist. I think it's a sad 
side-effect to not try to maintain the ordering but I'll let those who 
need this feature fight for it.

Daniel
-- 
Daniel Morissette
http://www.mapgears.com/