[Fwd: MS RFC xx: Performance improvements for labelcache and MINDISTANCE processing]

Sat Oct 13 23:08:37 EDT 2007

Hi all,

Since the threaded Rendering discussion got me thinking about this 
again, I thought I would dig up the draft RFC that I put together for 
Steve back in January and post it to the dev list in the hopes of 
resurrecting this idea again. You can search the dev list for Subject: 
"Label Cache Performance" to reread the thread on this.

Why should we care?

Well, label intensive maps can spend >50% of their time processing the 
label cache. The current processing appears to be an O(N*N) process, 
this proposal would change it to be an O(N) process which would 
basically eliminate it as a major cost item when rendering maps. Simply 
put this is low hanging fruit that would improve rendering performance 
when labels are involved.

I am happy to support any dev that wants to tackle the coding on this 
with algorithm support, defending the RFC, building data for test cases, 
etc.

Best regards,
   -Steve

-------- Original Message --------
Subject: MS RFC xx: Performance improvements for labelcache and 
MINDISTANCE processing
Date: Mon, 08 Jan 2007 21:40:02 -0500
From: Stephen Woodbridge <woodbri at swoodbridge.com>
To: Steve Lime <Steve.Lime at DNR.STATE.MN.US>

Hi Steve,

Here is an inital draft RFC. Please read it over and send me your
comments and concerns. I'm traveling this week, but I will try to
respond in the evening. If you think this is sufficient and will stand
on it own, then go ahead an put it up, but I think it need some
additional content from you or a potential implementor of this
enhancement that would describe data structures, etc.

MS RFC xx: Performance improvements for labelcache and MINDISTANCE
processing

date: 2007/01/08
author: Stephen Woodbridge
contact: woodbri at swoodbridge.com
author: Stephen Lime
contact: sdlime at comcast.com
status: proposal

Overview
--------

A large part of the the performance of a map draw can often be
attributed to the label cache processing. Label cache processing is
currently on order of N-squared for the number of items in the cache.
The current implement of the label cache is also used to support the
MINDISTANCE option of labels on a LAYER/CLASS basis.

This proposal describes a new algorithm to handle processing of the
label cache and MINDISTANCE that is order N, not N-squared. It also
decouples MINDISTANCE from the label cache which should have additional
performance benefits and decrease the number of list items for any give
label 1-2 orders of magnitude and for labels that have not yet been
rendered to a search length of zero.

Since the MINDISTANCE processing is done at every LAYER that request it,
this has the potential to improve LAYER processing speed for these layers.

Technical Details
-----------------

The proposed solution has few primary goals:

o significantly decrease the cost of processing the label cache
o decouple the label cache and the MINDISTANCE handling
o be totally transparent to MAPFILE, CGI, and other public APIs

New Label Cache Processing
--------------------------

The proposal for the new label cache handling is to convert it to a
raster solution from its current list search implementation. This
implementation would be effected by:

1) Create and maintain a label cache raster (or bit image) equivalent of
the map image being rendered.
2) When label collision needs to be performed, a polygon representing
the label is drawn into a small temporary raster, the size of the
rectangular bbox of the label, then that is positioned over and AND'd
with the label cache raster, the first pixel AND that is true indicates
a collision.
3) When a label is rendered, the polygon representing the label is OR's
into the label cache raster.

This will work with all existing mapserver labels, curved labels and
buffers about the labels, etc. Any existing label only needs the
bounding polygon that is currently put in the label cache to be rendered
into the small temporary image that then gets AND'd or OR'd to the label
cache raster.

New MINDISTANCE Processing
--------------------------

The proposal for the new MINDISTANCE handling it to convert it to an new
process that is LAYER centric and to use a hash lookup instead of a
linear search. An issue has been raised on the list that MINDISTANCE may
be class specific and this proposal will present a trivial way of
handling that case if it is need.

1) On Layer initialization, mindistance_hash is created. Some thought
will need to go into how many entries the hash will need to handle if it
can not grow dynamically.
2) For each label that needs to be rendered
        check to see if it is in the hash (the label text is the key)
        if the key exist then the value will be a pointer to a list
            check the distance from the current label to the list items
        if the label is ok, render it, and add/create a new list item
3) At the end of layer processing, free the list items and the hash

If we need to separate the labels by CLASS, then the hash key should be
constructed like:  sprintf(key, "%d:%s", class_number, label_text);

Using this scenario, and labeling and image 600x400 with a MINDISTANCE
200 we can expect a street crossing the image to have 1 to 3 possible
label points. This means that once you get the list from the hash you
would have a list of 1 to 3 items to search. If you were labeling a
streets layer if is not inconceivable that you might not have hundreds
of labels on this size image so for every potential label you want to
place you might have to search 1-2 orders of magnitude less labels using
a hash and sub-list structure described above.

There is no reason to keep the MINDISTANCE tied to the label cache
processing. It was implemented this way because it was a convenient
structure that existed.

Support for Non-GD Renderers
----------------------------

(Copied from RFC 11)
"Presently all MapServer output renders use the contents of the label
cache, which is basically render agnostic. This will not be the case any
more. The placement computations necessary to support curved labels do
leverage font metrics derived from the GD/Freetype interface. It may
well be possible for the SWF, PDF and SVG renders to leverage even the
GD-based curved labels, however it is probably best to consider this a
raster-only output feature in this implementation. If font metrics
support for other renderers is developed in the future then this feature
can be easily extended to support them."

Because RFC 11 broke this for Non-GD Renders, this should not be a
problem for this RFC. I do not see anything that would prevent other
renders from using these facilities also to perform the same functions.
This has not been contemplated in this RFC and no issue was raised on
the list during the initial discussion of this performance improvement.

Mapfile Implication
-------------------

o None

Public API Implication
----------------------

o None, I don't think mapscript gives you this low-level access to the
label cache.

Bug Tracking
------------

o TBD

Voting History
--------------

O TBD