[mapserver-dev] RFC98 status update

Jeff McKenna jmckenna at gatewaygeomatics.com
Tue Jul 9 07:17:02 PDT 2013


This is wonderful work by Thomas.

My only comment, is to make sure that each new library we add to
MapServer, in this case Harfbuzz, is well supported for compiling on all
popular platforms (in my case my concern is Windows).  I just don't want
to hit the issue we had (actually currently have) with libcairo-svg.  I
would suggest that each RFC that contains a new library must include a
"Supported Platforms" section (that of course must be researched before
moving forward).  More work upfront, but I think this will make
everyone's lives easier in the long run.

-jeff



On 2013-07-09 10:58 AM, thomas bonfort wrote:
> Devs,
> 
> Here's a status update about the rather major changes that are
> happening for and around RFC98. As a reminder, RFC98 is about
> refactoring the handling of text inside mapserver, to move the layout
> of individual glyphs out from the renderers (agg, cairo), and into a
> common framework. The user-visible changes will be the correct
> handling of complex scripts (arabic, hindi variants, etc...) and
> usually a speedup of the labelcache phase.
> All this will be incorporated in the RFC, but here it is anyway as it
> is probably easier to read than a diff. Most of you can skip to the
> summary at the end.
> 
> Dependencies:
> ===========
> - Harfbuzz is the library used for shaping and is an added dependency
> - Fribidi stays here, but is only used for determining bidi runs and
> not for shaping anymore. Hopefully the thread safety issues in fribidi
> were in the shaping part, and we may be able to remove the thread
> locks around fribidi calls. ICU could have been a replacement for
> bidi, but I preferred to keep using a current dependency rather than
> replace it with a new one.
> - UTHash (http://troydhanson.github.io/uthash/) is a header only
> hashtable implementation that works with arbitrary keys and values,
> and is used for accessing cached fonts and glyphs. Given some
> performance testing, we might want to replace our own hashtable
> implementation with this one some day.
> - RFC99 (dropping GD support) is inside the rfc98 branch, so GD is no
> longer a dependency.
> 
> Font and Glyph cache:
> =================
> (fontcache.c)
> I'm still pondering as to where to store a font and glyph cache.
> Making it global would ensure that cached glyphs are reusable across
> multiple requests for the fastcgi case, but in turn requires some
> thread-level protection and probably some pruning in order for it to
> remain of reasonable size. For now its lifetime is tied to the
> lifetime of the mapObj. Some APIs have changed in order to have the
> fontcache accessible.
> The fontcache contains caches for:
> - font faces (i.e. the representation of of a truetype file)
> - glyphs (i.e. the metrics of an individual glyph at a given size)
> - glyph bitmaps (the gray level rasterization of a glyph at a given
> size, with no rotation)
> Rotated bitmap glyphs do not get cached, as rotated text usually
> happens due to data driven parameters (i.e follow or auto labels), and
> thus are not candidates for caching.
> 
> 
> Text/Glyph representation:
> ====================
> Text is represented by a "textPathObj" which is basically a list of
> positioned glyphs. (e.g the word "Label" at size 10 for an arial font
> is represented by arial font's glyph "L" at position (x,y)= (0,0),
> glyph "a" at position (10,0) , "b" at (18,0), etc...). Multiline text
> is handled transparently by having glyphs positioned at different y
> values. A textPath can be either "absolute" (i.e. the glyph positions
> are in absolute image coordinates, used to position glyphs for angle
> follow labels), or "relative", in which case they must be offset by
> their labeling point.
> 
> Renderer implications:
> =================
> - Functions to render a string of text (ex msDrawText), to render a
> positioned string of text (ex msDrawTextLine), to render a truetype
> marker symbol, and to compute the extents of a string of text are
> removed.
> - A function to render a textPathObj is added. A rendererer may take
> advantage of cached glyph bitmaps if needed.
> 
> LabelCache Implications:
> ===================
> Work has been done to trim down the labelcache computations as much as possible:
> 
> When inserting features into the labelcache:
> - We'll insert a reference to the original labelObj instead of a copy
> if the labelObj and it's child styleObjs don't contain any bindings.
> This cuts dow on memory usage when attribute bindings aren't in use.
> - We don't insert features that will never get rendered (e.g. out of
> scale, too large for their feature (minfeaturesize keyword) )
> 
> At the msDrawLabelCache phase:
> * We delay computation of the label text bounding box to after we have
> checked conditions that would cause it not be renderered, i.e.
>   - if they have a MINDISTANCE set and a neighbouring label with
> identical text has already been rendered
>   - for labels without markers, we first check that the labelpoint
> doesn't collide with an existing label.
> * The Collision detection has been optimized:
>  - We keep a list of rendered labels and loop through those instead of
> checking the status of all the labels in the labelcache for each
> member
>  - The bounding metrics for a label has been cut down from a full
> shapeObj to a struct containing a bounding rect and an optional
> lineObj. For non rotated labels, there's no information needed more
> than the bounding box, which makes intersection detection much easier
> for two labels like this (i.e. the overlapping of these two labels is
> the same as the overlapping of their bounding boxes, no need to go
> into further geometric intersection primitives).
> The speedups for these changes are extremely important for cluttered
> maps, c.f. https://plus.google.com/u/0/118271009221580171800/posts/PrwhFYSkhea
> (e.g. rendering time goes from 800 to 1 second for 500.000 labels)
> 
> Miscellaneous libmapserver changes
> ===========================
> - The code in mapprimitive to compute positions for angle auto and
> angle follow has been sanitized and now directly uses computed glyph
> metrics
> - A number of functions have had their signature changed to accomodate
> for the change of architecture
> 
> Text shaping
> ==========
> All of the previous changes did not affect text shaping, which is a
> major component of RFC98. All the shaping happens in textlayout.c,
> who's principal role is to take a string of text as input, and return
> a list of positioned glyphs as output. The input string goes through
> multiple steps, and is plit into multiple run. Each run will have a
> distinct line number, bidi direction, and script "language".
> 
> As an example, we'll be working with the input unicode string "this is
> some text in english, ARABIC and JAPANESE". Capital letters are used
> to denote non latin glyphs, also note that ARABIC is stored in logical
> (=reading) order, whereas it would be written as CIBARA.
> 
> - iconv encoding conversion to convert the string to unicode
>  run1 = "this is some text in english, ARABIC and JAPANESE", line=0
> 
> - line wrapping: break on wrap character, break long lines on spaces
> run1 = "this is some text in english,", line=0
> run2 = "ARABIC and JAPANESE", line=1
> 
> - bidi levels
>  => each run has a single bidi direction (i.e. left-to-right or right-to-left)
> run1 = "this is some text in english,", line=0, direction=LTR
> run2 = "ARABIC" line=1, direction=RTL
> run3 = " and JAPANESE", line=1, direction = LTR
> 
> - script detection is applied to enable language dependant shaping,
> and also to refine which fonts will be used (more on that later)
> run1 = "this is some text in english,", line=0, direction=LTR, script=latin
> run2 = "ARABIC" line=1, direction=RTL, script=arabic
> run3 = " and " line=1, direction=LTR, script=latin
> run4 = "JAPANESE" line=1, direction=LTR, script=hiragana
> 
> - for each run, we select which font should be used, in order to use
> the same font inside a given run. A previous RFC allowed to specify
> multiple fonts for a LABEL, this has been extended to be able to fine
> tune which fonts are to be preferably used for a given script:
> 
> LABEL "arialuni,arial,cjk,arabic"
> 
> can now be written prefixed by a script identifier, i.e.
> 
> LABEL "arialuni,en:arial,ja:cjk,ar:arabic"
> 
> This is needed there is and will be overlap between font glyph
> coverages, and it should be possible to prioritize which font is used
> for which language.
> 
> - Each run is then fed into harfbuzz, which returns a list of
> positionned glyphs. The number of returned glyphs is not meant to be
> identical to the number of glyphs we had in the unicode string, and
> are ordered from left to right.
> 
> - The glyphs of each run are reassembled to account for line numbers
> and run positions (e.g. run 3 is offset down by one line, and placed
> to the right of run 2)
> 
> - Each line is horizontally offseted to account for ALIGN. LABEL ALIGN
> now stops defaulting to LEFT, so right-to-left runs will be right
> aligned instead of left as is now.
> 
> ========
> Summary
> ========
> 
> What we've gained
> ==============
> - A consistent framework for laying out text. Future enhancements
> concerning e.g. doublespacing text shall be simplified.
> - Correct (I hope) renderings of non latin scripts (multiline arabic
> text, complex shaping)
> - Fine grained control on which fonts to use. I also have some code
> ready to use fontconfig for font selection, not used for now as it has
> performance issues.
> - Correct ALIGN support for RTL languages.
> - Labelcache speedups and memory consumption decrease
> - Correct handling of glyph placements with respect to their baseline
> 
> 
> Backwards incompatible changes
> =========================
> - GD support has been dropped
> - ANNOTATION layers are dropped (they've been deprecated since 6.0)
> - HTML encoded entities inside attributes are not supported (e.g.
> é, ඀). They are still supported for specifying which
> character to use in truetype symbols.
> - API changes (not sure how and if this affects mapscripts yet)
> - Arabic shaping will depend on harfbuzz. A build without harfbuzz
> will not fallback to fribidi shaping.
> - All text related autotests fail, and output is slightly different
> (hinting is disabled, metrics calculation may have slightly varied)
> 
> Open Issues
> =========
> - Where to store font and glyph cache
> - For the kml renderer, is the labelcache phase necessary (in which
> case label bounds should be computed, however we should be feeding the
> original text not individual glyphs to the kml renderer).
> 
> regards,
> Thomas
> _______________________________________________


More information about the mapserver-dev mailing list