[GRASS-dev] Character encoding of module i.atcorr files

Glynn Clements glynn at gclements.plus.com
Thu Feb 27 22:45:58 PST 2014


Maris Nartiss wrote:

> The offending line is a reference in the comment section:
> http://trac.osgeo.org/grass/browser/grass/trunk/imagery/i.atcorr/computations.cpp#L1365
> 
> I browsed SUBMITTING file and didn't find any rules about source
> encoding. As a supporter of Unicode everywhere, I would suggest to add
> a requirement for source files to be in UTF-8. Upside - most of files
> already are in UTF-8. Thus only files with symbols outside of latin1
> would be affected.

Most files are ASCII. Those which aren't are almost evenly split
between ISO-8859-1 and UTF-8:

Files using ISO-8859-1:

raster/r.sunmask/g_solposition.c	U+00B0	DEGREE SIGN
imagery/i.topo.corr/main.c		U+00F1	LATIN SMALL LETTER N WITH TILDE
imagery/i.landsat.toar/landsat.h	U+00B5	MICRO SIGN
imagery/i.evapo.pm/functions.c		U+00B0	DEGREE SIGN
imagery/i.atcorr/computations.cpp	U+00E9	LATIN SMALL LETTER E WITH ACUTE
lib/raster/color_look.c			U+00AD	SOFT HYPHEN
lib/raster/color_set.c			U+00AD	SOFT HYPHEN

Files using UTF-8:

raster/r.sunmask/main.c			U+00B0	DEGREE SIGN
raster/r.watershed/ram/do_flatarea.c	U+2013	EN DASH
vector/v.net.salesman/main.c		U+2013	EN DASH
gui/wxpython/lmgr/frame.py		U+00F6	LATIN SMALL LETTER O WITH DIAERESIS
					U+2019	RIGHT SINGLE QUOTATION MARK
lib/python/pygrass/functions.py		U+00B0	DEGREE SIGN
lib/arraystats/class.c			U+00E9	LATIN SMALL LETTER E WITH ACUTE

Many of these are either gratuitous, e.g. use of soft hyphen or
en-dash when an ASCII "-" (U+002D HYPHEN-MINUS) would suffice.

Some are due to comments written in languages other than English
(i.topo.corr = Spanish, lib/arraystats = French); these should be
translated.

All but one are in comments: the pygrass one is a string literal,
which should really use escape notation (assuming that the
is_clean_name() function is actually correct, and not a half-baked
attempt at re-implementing G_legal_filename()).

So, if those are fixed, it boils down to whether we actually want to
have to deal with source-code encoding issue for the sake of comments
which include:

a) °C for degrees Celcius,
b) µm for micrometres (microns), and
c) proper names using the Latin script with accents (names using any
other script will invariably be romanised).

Personally, I would prefer it if source code was 7-bit clean.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list