[GRASS-dev] Character encoding of module i.atcorr files
Hamish
hamish_b at yahoo.com
Sun Mar 2 19:59:25 PST 2014
Maris wrote:
>> The offending line is a reference in the comment section:
>> http://trac.osgeo.org/grass/browser/grass/trunk/imagery/i.atcorr/computations.cpp#L1365
>>
>> I browsed SUBMITTING file and didn't find any rules about source
>> encoding.
...
Glynn wrote
> Most files are ASCII. Those which aren't are almost evenly split
> between ISO-8859-1 and UTF-8:
>
> Files using ISO-8859-1:
>
> raster/r.sunmask/g_solposition.c U+00B0 DEGREE SIGN
> imagery/i.topo.corr/main.c U+00F1 LATIN SMALL LETTER N WITH TILDE
> imagery/i.landsat.toar/landsat.h U+00B5 MICRO SIGN
> imagery/i.evapo.pm/functions.c U+00B0 DEGREE SIGN
> imagery/i.atcorr/computations.cpp U+00E9 LATIN SMALL LETTER E WITH ACUTE
> lib/raster/color_look.c U+00AD SOFT HYPHEN
> lib/raster/color_set.c U+00AD SOFT HYPHEN
>
> Files using UTF-8:
>
> raster/r.sunmask/main.c U+00B0 DEGREE SIGN
> raster/r.watershed/ram/do_flatarea.c U+2013 EN DASH
> vector/v.net.salesman/main.c U+2013 EN DASH
> gui/wxpython/lmgr/frame.py U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
> U+2019 RIGHT SINGLE QUOTATION MARK
> lib/python/pygrass/functions.py U+00B0 DEGREE SIGN
> lib/arraystats/class.c U+00E9 LATIN SMALL LETTER E WITH ACUTE
>
> Many of these are either gratuitous, e.g. use of soft hyphen or
> en-dash when an ASCII "-" (U+002D HYPHEN-MINUS) would suffice.
>
> Some are due to comments written in languages other than English
> (i.topo.corr = Spanish, lib/arraystats = French); these should be
> translated.
>
> All but one are in comments: the pygrass one is a string literal,
> which should really use escape notation (assuming that the
> is_clean_name() function is actually correct, and not a half-baked
> attempt at re-implementing G_legal_filename()).
>
> So, if those are fixed, it boils down to whether we actually want to
> have to deal with source-code encoding issue for the sake of comments
> which include:
>
> a) °C for degrees Celcius,
> b) µm for micrometres (microns), and
> c) proper names using the Latin script with accents (names using any
> other script will invariably be romanised).
I've now removed most of these in trunk with r59172.
remaining:
imagery/i.atcorr/computations.cpp (someone's name)
gui/wxpython/lmgr/frame.py (an example of something using UTF-8)
and lib/python/pygrass/functions.py ...
as for functions.py, hooking into G_legal_filename() would
be best, but failing that, a white-list of allowed chars would
seem much more robust than a small black-list of disallowed
chars.
> Personally, I would prefer it if source code was 7-bit clean.
Me too. Not sure how to deal with non-ASCII chars in people's names though.
regards,
Hamish
More information about the grass-dev
mailing list