[GRASS-dev] Character encoding of module i.atcorr files
Glynn Clements
glynn at gclements.plus.com
Thu Feb 27 22:45:58 PST 2014
Maris Nartiss wrote:
> The offending line is a reference in the comment section:
> http://trac.osgeo.org/grass/browser/grass/trunk/imagery/i.atcorr/computations.cpp#L1365
>
> I browsed SUBMITTING file and didn't find any rules about source
> encoding. As a supporter of Unicode everywhere, I would suggest to add
> a requirement for source files to be in UTF-8. Upside - most of files
> already are in UTF-8. Thus only files with symbols outside of latin1
> would be affected.
Most files are ASCII. Those which aren't are almost evenly split
between ISO-8859-1 and UTF-8:
Files using ISO-8859-1:
raster/r.sunmask/g_solposition.c U+00B0 DEGREE SIGN
imagery/i.topo.corr/main.c U+00F1 LATIN SMALL LETTER N WITH TILDE
imagery/i.landsat.toar/landsat.h U+00B5 MICRO SIGN
imagery/i.evapo.pm/functions.c U+00B0 DEGREE SIGN
imagery/i.atcorr/computations.cpp U+00E9 LATIN SMALL LETTER E WITH ACUTE
lib/raster/color_look.c U+00AD SOFT HYPHEN
lib/raster/color_set.c U+00AD SOFT HYPHEN
Files using UTF-8:
raster/r.sunmask/main.c U+00B0 DEGREE SIGN
raster/r.watershed/ram/do_flatarea.c U+2013 EN DASH
vector/v.net.salesman/main.c U+2013 EN DASH
gui/wxpython/lmgr/frame.py U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
U+2019 RIGHT SINGLE QUOTATION MARK
lib/python/pygrass/functions.py U+00B0 DEGREE SIGN
lib/arraystats/class.c U+00E9 LATIN SMALL LETTER E WITH ACUTE
Many of these are either gratuitous, e.g. use of soft hyphen or
en-dash when an ASCII "-" (U+002D HYPHEN-MINUS) would suffice.
Some are due to comments written in languages other than English
(i.topo.corr = Spanish, lib/arraystats = French); these should be
translated.
All but one are in comments: the pygrass one is a string literal,
which should really use escape notation (assuming that the
is_clean_name() function is actually correct, and not a half-baked
attempt at re-implementing G_legal_filename()).
So, if those are fixed, it boils down to whether we actually want to
have to deal with source-code encoding issue for the sake of comments
which include:
a) °C for degrees Celcius,
b) µm for micrometres (microns), and
c) proper names using the Latin script with accents (names using any
other script will invariably be romanised).
Personally, I would prefer it if source code was 7-bit clean.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list