[GRASS-dev] Character encoding of module i.atcorr files

Hamish hamish_b at yahoo.com
Sun Mar 2 19:59:25 PST 2014


Maris wrote:

>>  The offending line is a reference in the comment section:
>> http://trac.osgeo.org/grass/browser/grass/trunk/imagery/i.atcorr/computations.cpp#L1365
>> 
>>  I browsed SUBMITTING file and didn't find any rules about source
>>  encoding.
...

Glynn wrote
> Most files are ASCII. Those which aren't are almost evenly split
> between ISO-8859-1 and UTF-8:
> 
> Files using ISO-8859-1:
> 
> raster/r.sunmask/g_solposition.c    U+00B0    DEGREE SIGN
> imagery/i.topo.corr/main.c        U+00F1    LATIN SMALL LETTER N WITH TILDE
> imagery/i.landsat.toar/landsat.h    U+00B5    MICRO SIGN
> imagery/i.evapo.pm/functions.c        U+00B0    DEGREE SIGN
> imagery/i.atcorr/computations.cpp    U+00E9    LATIN SMALL LETTER E WITH ACUTE
> lib/raster/color_look.c            U+00AD    SOFT HYPHEN
> lib/raster/color_set.c            U+00AD    SOFT HYPHEN
> 
> Files using UTF-8:
> 
> raster/r.sunmask/main.c            U+00B0    DEGREE SIGN
> raster/r.watershed/ram/do_flatarea.c    U+2013    EN DASH
> vector/v.net.salesman/main.c        U+2013    EN DASH
> gui/wxpython/lmgr/frame.py        U+00F6    LATIN SMALL LETTER O WITH DIAERESIS
>                     U+2019    RIGHT SINGLE QUOTATION MARK
> lib/python/pygrass/functions.py        U+00B0    DEGREE SIGN
> lib/arraystats/class.c            U+00E9    LATIN SMALL LETTER E WITH ACUTE
> 
> Many of these are either gratuitous, e.g. use of soft hyphen or
> en-dash when an ASCII "-" (U+002D HYPHEN-MINUS) would suffice.
> 
> Some are due to comments written in languages other than English
> (i.topo.corr = Spanish, lib/arraystats = French); these should be
> translated.
> 
> All but one are in comments: the pygrass one is a string literal,
> which should really use escape notation (assuming that the
> is_clean_name() function is actually correct, and not a half-baked
> attempt at re-implementing G_legal_filename()).
> 
> So, if those are fixed, it boils down to whether we actually want to
> have to deal with source-code encoding issue for the sake of comments
> which include:
> 
> a) °C for degrees Celcius,
> b) µm for micrometres (microns), and
> c) proper names using the Latin script with accents (names using any
> other script will invariably be romanised).

I've now removed most of these in trunk with r59172.

remaining:
imagery/i.atcorr/computations.cpp (someone's name)
gui/wxpython/lmgr/frame.py (an example of something using UTF-8)

and lib/python/pygrass/functions.py ...

as for functions.py, hooking into G_legal_filename() would
be best, but failing that, a white-list of allowed chars would
seem much more robust than a small black-list of disallowed
chars.

 
> Personally, I would prefer it if source code was 7-bit clean.

Me too. Not sure how to deal with non-ASCII chars in people's names though.


regards,
Hamish



More information about the grass-dev mailing list