MS RFC 22a: Feature cache for long running processes and query processing (update)

Wed Jun 27 14:11:29 EDT 2007

Tamas, and all,

I finally managed to walk through RFC-22a. I can't say that I understand 
it all but I think I get the idea. The new options sound cool, but my 
first reaction is that it's not easy to understand how this works and 
what the implications are... and if it's hard for a developer to 
understand then imagine what it will be for the average users. I'm 
starting to miss the simplicity that's always been a focus in MapServer.

More importantly, I started asking myself how this RFC solved the 
initial problem which is the performance hit related to double-pass 
queries with several providers (did we forget that?).

If I understood the RFC correctly, the proposed solution to this problem 
will be to embed all layers inside a CACHE wrapper layer, correct? So by 
default all postgis/oracle/sde/etc connections will still suffer from 
the double-pass query problem unless we add the CACHE layer around them, 
right? This adds an additional thing that users needs to be aware of in 
order to tune their app.  But then the RFC is not clear on the other 
implications (if any) on memory usage and performance of using a CACHE 
layer around a postgis/oracle/etc connection for general rendering for 
instance. I was hoping for a better integrated solution that would not 
have increased complexity for users for the most general use case which 
is users wanting to avoid the double-pass query performance issue. At a 
minimum the documentation (or the RFC) will need a chapter that clearly 
documents and addresses the recommended best practices in order to get 
decent query performance without taking a hit on memory usage for 
general rendering, or any other undocumented performance hit.

Actually, I am assuming that using a cache layer implies caching for 
both rendering and querying, is that right? What if I first render a map 
at a given extents (rendering 1,000 shapes from a TAB file for instance) 
and then do a query by point? Are the 1,000 shapes left in the cache? Is 
the query done against the shapes in the cache or against the file in 
disk? Or is there an option to choose between one and the other? If the 
query is done against 1,000 shapes in cache, then the performance of 
doing a sequential scan of a hashtable will be much worse than a query 
by point directly on a TAB file which uses the file's spatial index. The 
same performance issue would likely apply to postgis connections or any 
other data sources that support spatial indexes.

What if the map extents change, does that automatically reset the cache?

Overall I am worried that the cost of using a CACHE layer for general 
rendering will outweight the benefits of caching... and as I wrote above 
the RFC doesn't provide a very user-friendly solution to the double-pass 
query issue (a problem for which I don't have an easy solution either). 
So for now I'm +0.

A few more questions/comments on the details of the RFC:

- I'm not sure I understand all the implications of nesting layers. 
First, why do we really need to allow more than one layer inside a layer 
(i.e. use an array of layerObj)?

- What does nesting do if the outer layer is not of connection type 
cache, layerfilter or geomtrans? For instance what happens if I have a 
layer of type point (shapefile) which contains two nested layers of type 
point?

   LAYER
     NAME "layer1"
     TYPE POINT
     DATA "shapefile1"
     ...

     LAYER
       NAME "layer2"
       TYPE POINT
       DATA "shapefile2"
     END

     LAYER
       NAME "layer3"
       TYPE POINT
       DATA "shapefile3"
     END
   END

- And what if I mix POINT and POLYGON layers like this?

   LAYER
     NAME "layer1"
     TYPE POINT
     DATA "shapefile1"

     LAYER
       NAME "layer2"
       TYPE POINT
       DATA "shapefile2"
     END

     LAYER
       NAME "layer3"
       TYPE POLYGON
       DATA "shapefile3"
     END
   END

- What are the implications of nesting layers on WMS services? I think 
users will (naively?) expect that the hierarchy of layers will be 
reflected in the WMS GetCapabilities, but I don't think that this is 
desirable. This may very well become a FAQ: "Why is the hierarchy of 
layers in my mapfile not reflected in WMS GetCapabilities?"

- The RFC proposes the use of hash tables to store the cache data. Why a 
hashtable and not an array or a list? The hashtable implementation seems 
more costly to me (or am I wrong?), and even if it has a NextItem method 
to walk through all objects, the order of objects is not maintained by a 
hashtable, so if a user has data sorted (by sortshp) then the sort oder 
will be lost and rendering order will become pseudo-random if done via a 
cache layer (unless I'm missing something?).

- With respect to STYLEITEM AUTO the plan is to cache the class objects 
for each shape? Won't that be expensive for large layers?

I feel bad for raising all those issues since you seem to have put quite 
a bit of hard work on this, and would need to put even more work to 
implement it. It also seems that everybody else likes the proposal, so 
hopefully it's just me who's not "getting it"... and in this case please 
show me the light.  :)

Daniel

Tamas Szekeres wrote:
> Hi All,
> 
> In the meantime I've created the core implementation of the proposed
> changes. The MS-RFC-22a have been updated according to Steve's
> comments and questions.
> 
> http://mapserver.gis.umn.edu/development/rfc/ms-rfc-22a
> 
> I've also constructed a sample scenario along with this RFC to
> visualize the power behind the concept.
> For the better readability I've also taken out most of the
> implementation code into separate patches attached to the
> corresponding ticket:
> 
> http://trac.osgeo.org/mapserver/ticket/2128
> 
> This RFC is still in discussion phase and I'm waiting for further
> comments and proposals. I also let you folks the decision whether this
> change is "simple" enough to paricipate in the upcoming release or
> not. Needless to say I'm pretty interested in the changes in my
> further projects but I don't intend to force anything against the
> majority of the community.
> 
> Best regards,
> 
> Tamas

-- 
Daniel Morissette
http://www.mapgears.com/