MS RFC 22a: Feature cache for long running processes and
query processing (update)
Daniel Morissette
dmorissette at MAPGEARS.COM
Wed Jun 27 14:11:29 EDT 2007
Tamas, and all,
I finally managed to walk through RFC-22a. I can't say that I understand
it all but I think I get the idea. The new options sound cool, but my
first reaction is that it's not easy to understand how this works and
what the implications are... and if it's hard for a developer to
understand then imagine what it will be for the average users. I'm
starting to miss the simplicity that's always been a focus in MapServer.
More importantly, I started asking myself how this RFC solved the
initial problem which is the performance hit related to double-pass
queries with several providers (did we forget that?).
If I understood the RFC correctly, the proposed solution to this problem
will be to embed all layers inside a CACHE wrapper layer, correct? So by
default all postgis/oracle/sde/etc connections will still suffer from
the double-pass query problem unless we add the CACHE layer around them,
right? This adds an additional thing that users needs to be aware of in
order to tune their app. But then the RFC is not clear on the other
implications (if any) on memory usage and performance of using a CACHE
layer around a postgis/oracle/etc connection for general rendering for
instance. I was hoping for a better integrated solution that would not
have increased complexity for users for the most general use case which
is users wanting to avoid the double-pass query performance issue. At a
minimum the documentation (or the RFC) will need a chapter that clearly
documents and addresses the recommended best practices in order to get
decent query performance without taking a hit on memory usage for
general rendering, or any other undocumented performance hit.
Actually, I am assuming that using a cache layer implies caching for
both rendering and querying, is that right? What if I first render a map
at a given extents (rendering 1,000 shapes from a TAB file for instance)
and then do a query by point? Are the 1,000 shapes left in the cache? Is
the query done against the shapes in the cache or against the file in
disk? Or is there an option to choose between one and the other? If the
query is done against 1,000 shapes in cache, then the performance of
doing a sequential scan of a hashtable will be much worse than a query
by point directly on a TAB file which uses the file's spatial index. The
same performance issue would likely apply to postgis connections or any
other data sources that support spatial indexes.
What if the map extents change, does that automatically reset the cache?
Overall I am worried that the cost of using a CACHE layer for general
rendering will outweight the benefits of caching... and as I wrote above
the RFC doesn't provide a very user-friendly solution to the double-pass
query issue (a problem for which I don't have an easy solution either).
So for now I'm +0.
A few more questions/comments on the details of the RFC:
- I'm not sure I understand all the implications of nesting layers.
First, why do we really need to allow more than one layer inside a layer
(i.e. use an array of layerObj)?
- What does nesting do if the outer layer is not of connection type
cache, layerfilter or geomtrans? For instance what happens if I have a
layer of type point (shapefile) which contains two nested layers of type
point?
LAYER
NAME "layer1"
TYPE POINT
DATA "shapefile1"
...
LAYER
NAME "layer2"
TYPE POINT
DATA "shapefile2"
END
LAYER
NAME "layer3"
TYPE POINT
DATA "shapefile3"
END
END
- And what if I mix POINT and POLYGON layers like this?
LAYER
NAME "layer1"
TYPE POINT
DATA "shapefile1"
LAYER
NAME "layer2"
TYPE POINT
DATA "shapefile2"
END
LAYER
NAME "layer3"
TYPE POLYGON
DATA "shapefile3"
END
END
- What are the implications of nesting layers on WMS services? I think
users will (naively?) expect that the hierarchy of layers will be
reflected in the WMS GetCapabilities, but I don't think that this is
desirable. This may very well become a FAQ: "Why is the hierarchy of
layers in my mapfile not reflected in WMS GetCapabilities?"
- The RFC proposes the use of hash tables to store the cache data. Why a
hashtable and not an array or a list? The hashtable implementation seems
more costly to me (or am I wrong?), and even if it has a NextItem method
to walk through all objects, the order of objects is not maintained by a
hashtable, so if a user has data sorted (by sortshp) then the sort oder
will be lost and rendering order will become pseudo-random if done via a
cache layer (unless I'm missing something?).
- With respect to STYLEITEM AUTO the plan is to cache the class objects
for each shape? Won't that be expensive for large layers?
I feel bad for raising all those issues since you seem to have put quite
a bit of hard work on this, and would need to put even more work to
implement it. It also seems that everybody else likes the proposal, so
hopefully it's just me who's not "getting it"... and in this case please
show me the light. :)
Daniel
Tamas Szekeres wrote:
> Hi All,
>
> In the meantime I've created the core implementation of the proposed
> changes. The MS-RFC-22a have been updated according to Steve's
> comments and questions.
>
> http://mapserver.gis.umn.edu/development/rfc/ms-rfc-22a
>
> I've also constructed a sample scenario along with this RFC to
> visualize the power behind the concept.
> For the better readability I've also taken out most of the
> implementation code into separate patches attached to the
> corresponding ticket:
>
> http://trac.osgeo.org/mapserver/ticket/2128
>
> This RFC is still in discussion phase and I'm waiting for further
> comments and proposals. I also let you folks the decision whether this
> change is "simple" enough to paricipate in the upcoming release or
> not. Needless to say I'm pretty interested in the changes in my
> further projects but I don't intend to force anything against the
> majority of the community.
>
> Best regards,
>
> Tamas
--
Daniel Morissette
http://www.mapgears.com/
More information about the mapserver-dev
mailing list