Label Cache Performance

Tue Feb 27 10:59:50 EST 2007

This still is on my radar, it's just a bit down the list at the moment.
I would propose testing a simple
implementation (worry about MINFEATUREDISTANCE later) to see if the
performance gains are
real. 

Steve

>>> Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> 2/22/2007 11:59:08 AM
>>>
Steve,

Can I get you to pop this back on your stack. I really think this would

represent a significant performance improvement to Mapserver and would

hate to see it die here.

-Steve W

Steve Lime wrote:
> Sorry, will try to get back to this thread over the weekend. I was
> hoping someone
> more knowledgable than I would comment on the hash issue.
> 
> Steve
> 
>>>> Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> 1/17/2007 3:32:58 PM
>>>>
> Steve,
> 
> Hope you had a good vacation. As you wade through your email, any 
> thoughts on this? I didn't see any follow up post from your on it.
> 
> -Steve W.
> 
> Stephen Woodbridge wrote:
>> Steve Lime wrote:
>>> Too bad our hash implmentation doesn't allow an arbitrary value
> (e.g.
>>> void *) per key. Looks like
>>> it's char *'s.
>> Bummer! So is this a show stopper?
>> Can we cast another pointer into a char *?
>> Seems like this is an opportunity to improve our hash
implementation
> so 
>> it can be used by other pieces of the code. Or we could just use
the
> GNU 
>> system has routines.
>>
>> And label_drawn_1 below probably needs to be a struct or array of
the
> 
>> pixel x,y where the label was drawn so it can be used for the
> distance 
>> calculation.
>>
>> -Steve W
>>
>>> Steve
>>>
>>>>>> Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> 1/8/2007 2:09:26
PM
>>>>>>
>>> Steve,
>>>
>>> The hash key for say
>>>     "Main St" => Array(label_drawn_1, label_drawn_2, ..]
>>> We only need to remember the the items that were actually drawn on
> the
>>> map since these are the only ones that need to be compared against
> when
>>> considering a potentially new label.
>>>
>>> Since the MINDISTANCE test is localized to a LAYER and possibly a
> CLASS
>>> modifying the key to include just the CLASS will suffice, like 
>>> "<class_num>:<label_text>". The hash would be created for each
> layer
>>> and then disposed and recreated for the next layer. This is ok
> because 
>>> MINDISTANCE does not span LAYERs unless I am mistaken.
>>>
>>> So here is pseudo-code for how this might work. Obviously there
are
> 
>>> details omitted and this would be implemented to be efficient. My
> goal
>>> here is to express the concept:
>>>
>>> for each layer {
>>>      if MINDISTANCE {
>>>          hash = create_hash()
>>>      } else {
>>>          hash = NULL;
>>>      }
>>>      for each label {
>>>          if (mindistance_ok(this.label, hash) &&
>>>              label_fits(this.label, bitmap)) {
>>>              # assumes it will update hash
>>>              draw_label(this.label, hash, bitmap)
>>>          }
>>>      }
>>> }
>>>
>>> mindistance_ok() returns true if hash == NULL
>>> label_fits() compares the label polygon to the bitmap and returns
> true
>>> if it fits
>>> draw_label() renders the label and updates the hash if it is not
> NULL 
>>> and updates the bitmap.
>>>
>>> I can expand the pseudo-code for mindistance_ok() and label_fits()
> is 
>>> needed, but it seems that we should agree in principal that this
> will 
>>> work for all existing cases. I have not heard any case that this
> can
>>> not be easily adapted to support.
>>>
>>> Steve - Does this cover the case you were concerned about? If not
> can 
>>> you explain the scenario a little better, because I'm missing it.
>>>
>>> -Steve
>>>
>>> Steve Lime wrote:
>>>> Was thinking about this more and don't think a hash for
> MINDISTANCE
>>> will
>>>> work since
>>>> the hash keys need to be unique and label/class/layer certainly
>>> isn't.
>>>> You might have
>>>> 50 occurances of the same label for a series of road segments.
I'm
>>> sure
>>>> there's a workaround but it would undoubtedly be complex...
>>>>
>>>> Steve
>>>>
>>>>>>> Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> 1/3/2007 9:35:26
> PM
>>>> Steve Lime wrote:
>>>>> We use the loops for MINDISTANCE because we obviously are
already
>>>> doing
>>>>> so for collisions.
>>>>>
>>>>> I thought about a hash too. Trouble is it's based not solely on
> the
>>>>> label text, but
>>>>> also class and layer indexes. For example you wouldn't want to
>>> ommit
>>>> a
>>>>> road
>>>>> label 'Washington' because of it's proximity to 'Washington'
> lake.
>>>>> Those types of
>>>>> situations are not all that rare...
>>>> Sure, but if you do the bitmap collision detection and use the
> hash
>>> for
>>>> MINDISTANCE, the hash would only exist while you are processing a
>>>> single layer and then get dumped and a new hash created for the
> next 
>>>> layer.
>>> If
>>>> you need to keep hash entries unique by CLASS also, then you
could
> 
>>>> either create an array of hashes and index by classnum or have a
>>> single
>>>> hash and prefix ot postfix the keys with the class like
>>>> "classnum:key".
>>>>
>>>> This should be very straight forward and I think very, very fast 
>>>> compared to the way it works now. I'm sure there are other little

>>>> adjustments that would need to be made, but I can't think of any
> show
>>>> stoppers.
>>>>
>>>> -Steve
>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>>>>> Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> 1/3/2007
12:14:06
>>> PM
>>>>> If I were dealing with the MINDISTANCE issue in Perl I would
> create
>>> a
>>>>> hash where the key was the label text, because that is what
>>>> MINDISTANCE
>>>>> is restricting, and make the hash value an array of locations.
> This
>>>>> would much faster than any search algorithm and you would narrow
>>> the
>>>>> search to the array of placed text items of the same name.
>>>>>
>>>>> There is no reason this approach could not be used for the
>>>> MINDISTANCE
>>>>> case in mapserver. Using this and the bitmap collision detection
>>>> should
>>>>> remove most of the overhead of the label cache processing and
> make
>>> it
>>>>> very scalable with respect to number of items to be processed.
>>>>>
>>>>> -Steve
>>>>>
>>>>> Steve Lime wrote:
>>>>>> I did think of one other gotcha. The cache is not just used for
>>>>> physical
>>>>>> overlap, there
>>>>>> are other tests applied. For example, MINDISTANCE, which
filters
>>>> out
>>>>>> like labels based
>>>>>> on proximity. That requires direct comparison of the bounding
>>>>> polygons.
>>>>>> One might be
>>>>>> able to reduce that to a test on label points (in fact that
> might
>>>> be
>>>>>> what's there now,
>>>>>> I'd have to check) but that still requires traversing placed
>>> labels
>>>>>> (albeit only those
>>>>>> within a particular layer)...
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>>>> Stephen Woodbridge <woodbri at swoodbridge.com> 1/2/2007
1:27:47
>>> PM
>>>>>> Yes, it works for all cases. You have to think about the
problem
>>> as
>>>> a
>>>>>> raster problem not a vector or character problem.
>>>>>>
>>>>>> So you do this by creating an image the size of rectangular
bbox
>>> of
>>>>> the
>>>>>> label. Then you render the the character or label polygons
boxes
>>>> into
>>>>>> the image. The image now becomes a mask representing the pixels
> of
>>>>>> where the label will fall if it was rendered.
>>>>>>
>>>>>> the test: Logical AND this mask image with the map label image
>>>> given
>>>>>> its position in that image. You have a collision on the first
> AND
>>>> result
>>>>>> the it true.
>>>>>>
>>>>>> to update the map label image, just OR this label mask with the
>>> map
>>>>>> label image given its position in that image.
>>>>>>
>>>>>> if you want to test another location, ie: you are searching for
> an
>>>>>> location nearby that the label will fit, then change the
> position
>>>> by
>>>>> a
>>>>>> pixel and repeat the test. This is more costly than just a
test,
>>>> but
>>>>> is
>>>>>> still pretty efficient. I don't think you would want to apply
> this
>>>> to
>>>>>> every label, but having it as an option for a layer might be 
>>>>>> interesting. You could constrain this to say search up/down or 
>>>>>> right/left 10 pixels and then something like lot labels could
>>>> easily
>>>>> be
>>>>>> staggered. This is just an idea and it might not farm out to be
>>>>> useful,
>>>>>> but once the machinery is in, it falls out as a simple
variation
>>> on
>>>>> the
>>>>>> theme.
>>>>>>
>>>>>> -Steve
>>>>>>
>>>>>> Paul Spencer wrote:
>>>>>>> This would need to work for 'curved' labels too.  Would it
work
>>> to
>>>>> do
>>>>>>> things per-character?
>>>>>>>
>>>>>>> Paul
>>>>>>>
>>>>>>> On 2-Jan-07, at 12:40 PM, Steve Lime wrote:
>>>>>>>
>>>>>>>> Remember also that you're not dealing strictly with bounding
>>>> boxes
>>>>>> (e.g.
>>>>>>>> rects) but
>>>>>>>> actual polygons whenever rotation is applied. This is so you
> get
>>>>>>>> relatively tight fits
>>>>>>>> around labels and can cram labels on a map.
>>>>>>>>
>>>>>>>>>>> Stephen Woodbridge <woodbri at swoodbridge.com> 1/2/2007
>>> 10:25:49
>>>>>> AM
>>>>>>>> Steve Lime wrote:
>>>>>>>>> Interesting idea. No, it's not been looked at before, not by
> me
>>>>>>>>> anyway. Frank might have some thoughts on performance. You'd
>>>>> still
>>>>>>>>> have to loop through the affected area to find collisions
but
> I
>>>>>>>> think
>>>>>>>>> your correct in that the search time would basically be
>>> constant
>>>>>>>>> (affected slightly by the size of the label).
>>>>>>>>>
>>>>>>>>> Wouldn't be that hard to implement. I wonder if GD holding
> the
>>>>>> cache
>>>>>>>>> image could be useful or if a custom 2-D bit array would be
>>>>>> faster-
>>>>>>>>> probably the latter...
>>>>>>>> I thin a 2-D bit array would be much faster and require
>>>>>> significantly
>>>>>>>> less memory, but would require writing a little additional
> code.
>>>>>> The
>>>>>>>> upside to this is that the code could be optimized for this
>>>>>> specific
>>>>>>>> task, unless there are other potential uses for this.
>>>>>>>>
>>>>>>>> Another idea I had related to this scheme would be that you
>>> could
>>>>>>>> implement a function could search for a placement position
>>>> withing
>>>>>> some
>>>>>>>> number of pixels. Here the idea would be that you use the
>>> polygon
>>>>>> to
>>>>>>>> create a mask and see if the mask collides, if not you are
> done,
>>>>>>>> otherwise you do a spiral search outwards from the anchor
> point
>>>>> for
>>>>>>>> some
>>>>>>>> number of pixels looking for a place the label will fit. This
> is
>>>>>> easy
>>>>>>>> to
>>>>>>>> do in some nested loops and very fast.
>>>>>>>>
>>>>>>>> This idea of a moving mask is fundamental to a lot of the
> raster
>>>>>>>> manipulation algorithm and is easy to implement and very fast
> to
>>>>>>>> execute.
>>>>>>>>
>>>>>>>> -Steve W
>>>>>>>>
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>>>>> Stephen Woodbridge <woodbri at SWOODBRIDGE.COM> 1/2/2007
>>> 9:46:46
>>>>>>>>>>>> AM
>>>>>>>>>>>>
>>>>>>>>> HNY All,
>>>>>>>>>
>>>>>>>>> Last year I spent a fair amount of time optimizing a bunch
of
>>>>>>>>> mapfiles
>>>>>>>>>
>>>>>>>>> for performance. One of the areas that seems to always be a
>>>>>>>>> performance
>>>>>>>>>
>>>>>>>>> hog if processing the Label Cache. I have been successful
are
>>>>>>>>> thinning
>>>>>>>>>
>>>>>>>>> down the number of labels going into the cache, but I
> wondering
>>>>> if
>>>>>>>>> there might be a more efficient way of dealing with it.
Since
> I
>>>>>> have
>>>>>>>>> been playing with rasters over the holidays I was wondering
> if
>>>> it
>>>>>>>>> might be possible to implement the label cache as a bitmap
> mask
>>>>>> and
>>>>>>>>> us a raster
>>>>>>>>>
>>>>>>>>> OP to see if there is a collision?
>>>>>>>>>
>>>>>>>>> OK, I'm sure it is possible, I guess my real question(s)
are:
>>>>>>>>>
>>>>>>>>> Has this approach been looked at before? Any thoughts on
>>>>>>>> performance?
>>>>>>>>> My idea is:
>>>>>>>>>
>>>>>>>>> 1) create an label cache image 2) to check a label, get its
>>> bbox
>>>>>>>>> polyon check if it overlaps and set pixels 3) polygon fill
> the
>>>>>> bbox
>>>>>>>>> of a label into the label cache image when a label is
> rendered
>>>>>>>>> we might be able to use a 2:1 or 4:1 reduced size label
cache
>>>>>> image
>>>>>>>>> with some thought about how to work around the issues.
>>>>>>>>>
>>>>>>>>> I would think that this would be much faster than searching
a
>>>>> list
>>>>>>>> of
>>>>>>>>>  drawn labels and checking for intersections. And it should
> be
>>>>>>>> linear
>>>>>>>>>  with the number of labels in the cache.
>>>>>>>>>
>>>>>>>>> -Steve
> +-----------------------------------------------------------------+
>>>>>>> |Paul Spencer                          pspencer at dmsolutions.ca 

> 
>>>> |
>>>>
> +-----------------------------------------------------------------+
>>>>>>> |Chief Technology Officer                                      

> 
>>>> |
>>>>>>> |DM Solutions Group Inc               
> http://www.dmsolutions.ca/ 
>>>> |
>>>>
> +-----------------------------------------------------------------+