No subject


Fri Feb 8 14:55:41 EST 2008


layers inside the cache layers and the second one would imply decorating
the existing layers with additional parameters related to the caching solution.
I guess the first one will take a 60 sec extra work per layer when doing this
by hand.

> But then the RFC is not clear on the other
> implications (if any) on memory usage and performance of using a CACHE
> layer around a postgis/oracle/etc connection for general rendering for
> instance. I was hoping for a better integrated solution that would not
> have increased complexity for users for the most general use case which
> is users wanting to avoid the double-pass query performance issue. At a
> minimum the documentation (or the RFC) will need a chapter that clearly
> documents and addresses the recommended best practices in order to get
> decent query performance without taking a hit on memory usage for
> general rendering, or any other undocumented performance hit.
>

With the current implemetation using the default option mostly the shapes
involved in the first phase are retained in the cache. I don't consider too
much memory related issues with that and most of the providers fetch
those shapes in a single access to the data source.
In the future, however, I'll consider retaining shapes of multiple extents
and that implementation will have to deal with some kind of cache
saturation management and item expiration.

> Actually, I am assuming that using a cache layer implies caching for
> both rendering and querying, is that right? What if I first render a map
> at a given extents (rendering 1,000 shapes from a TAB file for instance)
> and then do a query by point? Are the 1,000 shapes left in the cache? Is
> the query done against the shapes in the cache or against the file in
> disk? Or is there an option to choose between one and the other? If the
> query is done against 1,000 shapes in cache, then the performance of
> doing a sequential scan of a hashtable will be much worse than a query
> by point directly on a TAB file which uses the file's spatial index. The
> same performance issue would likely apply to postgis connections or any
> other data sources that support spatial indexes.

With the current implementation the lookup will happen among
the 1000 shapes since the subsequent query extent will fall inside the
previous one. However with some line of code we could create an additional
option to reconstruct the cache in every WhichShapes call.

However because the hashsize have been set to 512 I don't think we
have to do much of the sequential scans. If the shapes are spread steadily
across the array on the hashtable we'll have to skip on the average
of 1.5 items in effect. That's possibly outperform the necessary disk
accesses spatial index lookups and shape creations.

>
> What if the map extents change, does that automatically reset the cache?
>

If a portion of the new extent falls outside of the previous then the
cache is repopulated.

> Overall I am worried that the cost of using a CACHE layer for general
> rendering will outweight the benefits of caching... and as I wrote above
> the RFC doesn't provide a very user-friendly solution to the double-pass
> query issue (a problem for which I don't have an easy solution either).
> So for now I'm +0.
>

This is at least one option to use but not compulsory to use.
We can possibly offer that or continue to leave the users alone with
this problem.

> A few more questions/comments on the details of the RFC:
>
> - I'm not sure I understand all the implications of nesting layers.
> First, why do we really need to allow more than one layer inside a layer
> (i.e. use an array of layerObj)?
>

Currently the filter layer will be linked to 2 layers. One of them
will serve the source
features and the other serves the features for the spatial filtering. Each of
the layers might be nested and not participate in the rendering
directly. In means that
I should support at least 2 layers to be nested for now.
In the future I'm planning to make the possibility to obtain the selection
shapes from arbitrary number of layers.

> - What does nesting do if the outer layer is not of connection type
> cache, layerfilter or geomtrans? For instance what happens if I have a
> layer of type point (shapefile) which contains two nested layers of type
> point?
>
> - And what if I mix POINT and POLYGON layers like this?
>

Nothing will happen since the outer layer is not aware of the nested layers.
Those layers are only exist and will be destroyed automatically upon the
destruction of the outer layer. Only the proposed layers might use the nested
layers if the reference to that layers are set (in the CONNECTION parameter
for example)

>
> - What are the implications of nesting layers on WMS services? I think
> users will (naively?) expect that the hierarchy of layers will be
> reflected in the WMS GetCapabilities, but I don't think that this is
> desirable. This may very well become a FAQ: "Why is the hierarchy of
> layers in my mapfile not reflected in WMS GetCapabilities?"
>

Only the root layers participate in the renderings (which are added to the
layers collection of the map), so there's no need to alter the current
approach. The nested layers will only behave as data sources for the outer
layer providers (like the shapefiles or spatial data tables etc. for
the existing
providers)

> - The RFC proposes the use of hash tables to store the cache data. Why a
> hashtable and not an array or a list? The hashtable implementation seems
> more costly to me (or am I wrong?),

The hashtable will speed up the random access of the shapes during the
getShape call. However for the nextShape we should not do a hashtable
lookup on every shape but access the subsequent shape directly. Since
the current implementation (in maphash.c) does not support that, I had to
introduce these additional functions.

> and even if it has a NextItem method
> to walk through all objects, the order of objects is not maintained by a
> hashtable, so if a user has data sorted (by sortshp) then the sort oder
> will be lost and rendering order will become pseudo-random if done via a
> cache layer (unless I'm missing something?).
>

That's true. I'm not aware of the order of the renderings in this case.
In my practice I haven't found such a problem it was required.
However we could use an additional list to treat this issue if it is
significant.

> - With respect to STYLEITEM AUTO the plan is to cache the class objects
> for each shape? Won't that be expensive for large layers?
>

It would imply additional memory usage. However I've considered to keep from
storing the classes being equal with any of the previous, but haven't
implemented the solution yet.

>
> I feel bad for raising all those issues since you seem to have put quite
> a bit of hard work on this, and would need to put even more work to
> implement it. It also seems that everybody else likes the proposal, so
> hopefully it's just me who's not "getting it"... and in this case please
> show me the light.  :)
>

I don't think this RFC is as simple as the others but the amount of the
new possibilities might keep it prominent. The only challenge for me is
to show up the feasibility of the concept and the possible issues it can solve.
That's why I had to waste 5 days for implementing a working solution.

I'm also an "old-style programmer" and aspire to solve the problems as simply
as I can. But since the addressed problems are complex by nature I think
it trivial me to tolerate some extra work in the implementation.

Finally, I don't think this actual implementation would not evolve in the future
releases. I consider this as an initial approach and will be refined upon the
experiences and the further needs. The concept would remain the same
and the actual implementation inside the providers might enhance.

I won't be disappointed much if this proposal is not accepted, which means
 that the features addressed by this RFC are totally useless,
and might not be an issue for anyone in the future. I can accept that
and then I'll revoke it entirely.

However I would be more disappointed if these issues would make sense
but I hadn't enough power to convince the others about that, and the
option will be disappeared undeservedly but might be a potential to
use by the others.


Best regards,

Tamas



More information about the mapserver-dev mailing list