[GRASS-dev] man pages in UTF-8

Glynn Clements glynn at gclements.plus.com
Wed Mar 7 17:01:52 EST 2012


Markus Neteler wrote:

> for Fedora and other distros UTF-8 encoding of manual pages is required.
> How about changing all HTML files to UTF-8 (I can do that)?
> Any side effects to be expected?

It would be better to change the HTML source files to use entities
rather than any particular encoding. Currently, they use a mix of
ISO-8859-1 and UTF-8 (those which use UTF-8 won't show correctly,
because the files are treated as being in ISO-8859-1). In 7.0, the
only <module>.html files containing non-ASCII characters are:

	i.evapo.pt
	i.landsat.acca
	r.external.out
	r.sun
	r.walk

No modules appear to use non-ASCII characters in their
--html-description output (at least, not for the "C" locale).

mkhtml.py just copies the bytes verbatim, but it adds a "meta" tag to
the output indicating that the data is in ISO-8859-1:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

g.html2man will need an option to select the output encoding (UTF-8 is
a GNU groff extension), and will need to convert the output to that
encoding; for UTF-8, it needs to add a byte order mark to file so that
preconv recognises it as UTF-8.

The changes will be simpler if the input is in ASCII or ISO-8859-1. 
They will be more complex if HTML files are allowed to use characters
outside of the Latin-1 repertoire (currently, this only affects
i.atcorr, which uses "&lambda;", which ends up as "&#955;" in the
manual page).

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list