[mapserver-dev] RFC98 status update

thomas bonfort thomas.bonfort at gmail.com
Tue Jul 9 09:38:33 PDT 2013


Jeff,
Please state your recommended plan of action should Harfbuzz not be
supported on Windows (it happens it does, but the question still
stands).

Regards,
Thomas

On 9 July 2013 16:17, Jeff McKenna <jmckenna at gatewaygeomatics.com> wrote:
> This is wonderful work by Thomas.
>
> My only comment, is to make sure that each new library we add to
> MapServer, in this case Harfbuzz, is well supported for compiling on all
> popular platforms (in my case my concern is Windows).  I just don't want
> to hit the issue we had (actually currently have) with libcairo-svg.  I
> would suggest that each RFC that contains a new library must include a
> "Supported Platforms" section (that of course must be researched before
> moving forward).  More work upfront, but I think this will make
> everyone's lives easier in the long run.
>
> -jeff
>
>
>
> On 2013-07-09 10:58 AM, thomas bonfort wrote:
>> Devs,
>>
>> Here's a status update about the rather major changes that are
>> happening for and around RFC98. As a reminder, RFC98 is about
>> refactoring the handling of text inside mapserver, to move the layout
>> of individual glyphs out from the renderers (agg, cairo), and into a
>> common framework. The user-visible changes will be the correct
>> handling of complex scripts (arabic, hindi variants, etc...) and
>> usually a speedup of the labelcache phase.
>> All this will be incorporated in the RFC, but here it is anyway as it
>> is probably easier to read than a diff. Most of you can skip to the
>> summary at the end.
>>
>> Dependencies:
>> ===========
>> - Harfbuzz is the library used for shaping and is an added dependency
>> - Fribidi stays here, but is only used for determining bidi runs and
>> not for shaping anymore. Hopefully the thread safety issues in fribidi
>> were in the shaping part, and we may be able to remove the thread
>> locks around fribidi calls. ICU could have been a replacement for
>> bidi, but I preferred to keep using a current dependency rather than
>> replace it with a new one.
>> - UTHash (http://troydhanson.github.io/uthash/) is a header only
>> hashtable implementation that works with arbitrary keys and values,
>> and is used for accessing cached fonts and glyphs. Given some
>> performance testing, we might want to replace our own hashtable
>> implementation with this one some day.
>> - RFC99 (dropping GD support) is inside the rfc98 branch, so GD is no
>> longer a dependency.
>>
>> Font and Glyph cache:
>> =================
>> (fontcache.c)
>> I'm still pondering as to where to store a font and glyph cache.
>> Making it global would ensure that cached glyphs are reusable across
>> multiple requests for the fastcgi case, but in turn requires some
>> thread-level protection and probably some pruning in order for it to
>> remain of reasonable size. For now its lifetime is tied to the
>> lifetime of the mapObj. Some APIs have changed in order to have the
>> fontcache accessible.
>> The fontcache contains caches for:
>> - font faces (i.e. the representation of of a truetype file)
>> - glyphs (i.e. the metrics of an individual glyph at a given size)
>> - glyph bitmaps (the gray level rasterization of a glyph at a given
>> size, with no rotation)
>> Rotated bitmap glyphs do not get cached, as rotated text usually
>> happens due to data driven parameters (i.e follow or auto labels), and
>> thus are not candidates for caching.
>>
>>
>> Text/Glyph representation:
>> ====================
>> Text is represented by a "textPathObj" which is basically a list of
>> positioned glyphs. (e.g the word "Label" at size 10 for an arial font
>> is represented by arial font's glyph "L" at position (x,y)= (0,0),
>> glyph "a" at position (10,0) , "b" at (18,0), etc...). Multiline text
>> is handled transparently by having glyphs positioned at different y
>> values. A textPath can be either "absolute" (i.e. the glyph positions
>> are in absolute image coordinates, used to position glyphs for angle
>> follow labels), or "relative", in which case they must be offset by
>> their labeling point.
>>
>> Renderer implications:
>> =================
>> - Functions to render a string of text (ex msDrawText), to render a
>> positioned string of text (ex msDrawTextLine), to render a truetype
>> marker symbol, and to compute the extents of a string of text are
>> removed.
>> - A function to render a textPathObj is added. A rendererer may take
>> advantage of cached glyph bitmaps if needed.
>>
>> LabelCache Implications:
>> ===================
>> Work has been done to trim down the labelcache computations as much as possible:
>>
>> When inserting features into the labelcache:
>> - We'll insert a reference to the original labelObj instead of a copy
>> if the labelObj and it's child styleObjs don't contain any bindings.
>> This cuts dow on memory usage when attribute bindings aren't in use.
>> - We don't insert features that will never get rendered (e.g. out of
>> scale, too large for their feature (minfeaturesize keyword) )
>>
>> At the msDrawLabelCache phase:
>> * We delay computation of the label text bounding box to after we have
>> checked conditions that would cause it not be renderered, i.e.
>>   - if they have a MINDISTANCE set and a neighbouring label with
>> identical text has already been rendered
>>   - for labels without markers, we first check that the labelpoint
>> doesn't collide with an existing label.
>> * The Collision detection has been optimized:
>>  - We keep a list of rendered labels and loop through those instead of
>> checking the status of all the labels in the labelcache for each
>> member
>>  - The bounding metrics for a label has been cut down from a full
>> shapeObj to a struct containing a bounding rect and an optional
>> lineObj. For non rotated labels, there's no information needed more
>> than the bounding box, which makes intersection detection much easier
>> for two labels like this (i.e. the overlapping of these two labels is
>> the same as the overlapping of their bounding boxes, no need to go
>> into further geometric intersection primitives).
>> The speedups for these changes are extremely important for cluttered
>> maps, c.f. https://plus.google.com/u/0/118271009221580171800/posts/PrwhFYSkhea
>> (e.g. rendering time goes from 800 to 1 second for 500.000 labels)
>>
>> Miscellaneous libmapserver changes
>> ===========================
>> - The code in mapprimitive to compute positions for angle auto and
>> angle follow has been sanitized and now directly uses computed glyph
>> metrics
>> - A number of functions have had their signature changed to accomodate
>> for the change of architecture
>>
>> Text shaping
>> ==========
>> All of the previous changes did not affect text shaping, which is a
>> major component of RFC98. All the shaping happens in textlayout.c,
>> who's principal role is to take a string of text as input, and return
>> a list of positioned glyphs as output. The input string goes through
>> multiple steps, and is plit into multiple run. Each run will have a
>> distinct line number, bidi direction, and script "language".
>>
>> As an example, we'll be working with the input unicode string "this is
>> some text in english, ARABIC and JAPANESE". Capital letters are used
>> to denote non latin glyphs, also note that ARABIC is stored in logical
>> (=reading) order, whereas it would be written as CIBARA.
>>
>> - iconv encoding conversion to convert the string to unicode
>>  run1 = "this is some text in english, ARABIC and JAPANESE", line=0
>>
>> - line wrapping: break on wrap character, break long lines on spaces
>> run1 = "this is some text in english,", line=0
>> run2 = "ARABIC and JAPANESE", line=1
>>
>> - bidi levels
>>  => each run has a single bidi direction (i.e. left-to-right or right-to-left)
>> run1 = "this is some text in english,", line=0, direction=LTR
>> run2 = "ARABIC" line=1, direction=RTL
>> run3 = " and JAPANESE", line=1, direction = LTR
>>
>> - script detection is applied to enable language dependant shaping,
>> and also to refine which fonts will be used (more on that later)
>> run1 = "this is some text in english,", line=0, direction=LTR, script=latin
>> run2 = "ARABIC" line=1, direction=RTL, script=arabic
>> run3 = " and " line=1, direction=LTR, script=latin
>> run4 = "JAPANESE" line=1, direction=LTR, script=hiragana
>>
>> - for each run, we select which font should be used, in order to use
>> the same font inside a given run. A previous RFC allowed to specify
>> multiple fonts for a LABEL, this has been extended to be able to fine
>> tune which fonts are to be preferably used for a given script:
>>
>> LABEL "arialuni,arial,cjk,arabic"
>>
>> can now be written prefixed by a script identifier, i.e.
>>
>> LABEL "arialuni,en:arial,ja:cjk,ar:arabic"
>>
>> This is needed there is and will be overlap between font glyph
>> coverages, and it should be possible to prioritize which font is used
>> for which language.
>>
>> - Each run is then fed into harfbuzz, which returns a list of
>> positionned glyphs. The number of returned glyphs is not meant to be
>> identical to the number of glyphs we had in the unicode string, and
>> are ordered from left to right.
>>
>> - The glyphs of each run are reassembled to account for line numbers
>> and run positions (e.g. run 3 is offset down by one line, and placed
>> to the right of run 2)
>>
>> - Each line is horizontally offseted to account for ALIGN. LABEL ALIGN
>> now stops defaulting to LEFT, so right-to-left runs will be right
>> aligned instead of left as is now.
>>
>> ========
>> Summary
>> ========
>>
>> What we've gained
>> ==============
>> - A consistent framework for laying out text. Future enhancements
>> concerning e.g. doublespacing text shall be simplified.
>> - Correct (I hope) renderings of non latin scripts (multiline arabic
>> text, complex shaping)
>> - Fine grained control on which fonts to use. I also have some code
>> ready to use fontconfig for font selection, not used for now as it has
>> performance issues.
>> - Correct ALIGN support for RTL languages.
>> - Labelcache speedups and memory consumption decrease
>> - Correct handling of glyph placements with respect to their baseline
>>
>>
>> Backwards incompatible changes
>> =========================
>> - GD support has been dropped
>> - ANNOTATION layers are dropped (they've been deprecated since 6.0)
>> - HTML encoded entities inside attributes are not supported (e.g.
>> é, ඀). They are still supported for specifying which
>> character to use in truetype symbols.
>> - API changes (not sure how and if this affects mapscripts yet)
>> - Arabic shaping will depend on harfbuzz. A build without harfbuzz
>> will not fallback to fribidi shaping.
>> - All text related autotests fail, and output is slightly different
>> (hinting is disabled, metrics calculation may have slightly varied)
>>
>> Open Issues
>> =========
>> - Where to store font and glyph cache
>> - For the kml renderer, is the labelcache phase necessary (in which
>> case label bounds should be computed, however we should be feeding the
>> original text not individual glyphs to the kml renderer).
>>
>> regards,
>> Thomas
>> _______________________________________________
> _______________________________________________
> mapserver-dev mailing list
> mapserver-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapserver-dev


More information about the mapserver-dev mailing list