<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Mar 2, 2014 at 10:59 PM, Hamish <span dir="ltr"><<a href="mailto:hamish_b@yahoo.com" target="_blank">hamish_b@yahoo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="">Maris wrote:<br>
<br>
>> The offending line is a reference in the comment section:<br>
>> <a href="http://trac.osgeo.org/grass/browser/grass/trunk/imagery/i.atcorr/computations.cpp#L1365" target="_blank">http://trac.osgeo.org/grass/browser/grass/trunk/imagery/i.atcorr/computations.cpp#L1365</a><br>
>><br>
>> I browsed SUBMITTING file and didn't find any rules about source<br>
>> encoding.<br>
</div>...<br>
<br>
Glynn wrote<br>
<div><div class="h5">> Most files are ASCII. Those which aren't are almost evenly split<br>
> between ISO-8859-1 and UTF-8:<br>
><br>
> Files using ISO-8859-1:<br>
><br>
> raster/r.sunmask/g_solposition.c U+00B0 DEGREE SIGN<br>
> imagery/i.topo.corr/main.c U+00F1 LATIN SMALL LETTER N WITH TILDE<br>
> imagery/i.landsat.toar/landsat.h U+00B5 MICRO SIGN<br>
> imagery/<a href="http://i.evapo.pm/functions.c" target="_blank">i.evapo.pm/functions.c</a> U+00B0 DEGREE SIGN<br>
> imagery/i.atcorr/computations.cpp U+00E9 LATIN SMALL LETTER E WITH ACUTE<br>
> lib/raster/color_look.c U+00AD SOFT HYPHEN<br>
> lib/raster/color_set.c U+00AD SOFT HYPHEN<br>
><br>
> Files using UTF-8:<br>
><br>
> raster/r.sunmask/main.c U+00B0 DEGREE SIGN<br>
> raster/r.watershed/ram/do_flatarea.c U+2013 EN DASH<br>
> vector/v.net.salesman/main.c U+2013 EN DASH<br>
> gui/wxpython/lmgr/frame.py U+00F6 LATIN SMALL LETTER O WITH DIAERESIS<br>
> U+2019 RIGHT SINGLE QUOTATION MARK<br>
> lib/python/pygrass/functions.py U+00B0 DEGREE SIGN<br>
> lib/arraystats/class.c U+00E9 LATIN SMALL LETTER E WITH ACUTE<br>
><br>
> Many of these are either gratuitous, e.g. use of soft hyphen or<br>
> en-dash when an ASCII "-" (U+002D HYPHEN-MINUS) would suffice.<br>
><br>
> Some are due to comments written in languages other than English<br>
> (i.topo.corr = Spanish, lib/arraystats = French); these should be<br>
> translated.<br>
><br>
> All but one are in comments: the pygrass one is a string literal,<br>
> which should really use escape notation (assuming that the<br>
> is_clean_name() function is actually correct, and not a half-baked<br>
> attempt at re-implementing G_legal_filename()).<br>
><br>
> So, if those are fixed, it boils down to whether we actually want to<br>
> have to deal with source-code encoding issue for the sake of comments<br>
> which include:<br>
><br>
> a) °C for degrees Celcius,<br>
> b) µm for micrometres (microns), and<br>
> c) proper names using the Latin script with accents (names using any<br>
> other script will invariably be romanised).<br>
<br>
</div></div>I've now removed most of these in trunk with r59172.<br>
<br>
remaining:<br>
imagery/i.atcorr/computations.cpp (someone's name)<br></blockquote><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
gui/wxpython/lmgr/frame.py (an example of something using UTF-8)<br>
<br></blockquote><div><a href="https://trac.osgeo.org/grass/browser/grass/trunk/gui/wxpython/lmgr/frame.py#L978">https://trac.osgeo.org/grass/browser/grass/trunk/gui/wxpython/lmgr/frame.py#L978</a><br></div><div><br></div>
<div>I wanted this to be just written without UTF-8 chars but since UTF-8 chars is what makes problematic, I agree with MarkusN that it is better to be explicit.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
and lib/python/pygrass/functions.py ...<br>
<br>
as for functions.py, hooking into G_legal_filename() would<br>
be best, but failing that, a white-list of allowed chars would<br>
seem much more robust than a small black-list of disallowed<br>
chars.<br>
<div class=""><br>
<br>
> Personally, I would prefer it if source code was 7-bit clean.<br>
<br>
</div>Me too. Not sure how to deal with non-ASCII chars in people's names though.<br>
<br></blockquote><div>The problem is that each language deal with this differently. While for Czech you write Petras instead of Petráš, for German, you write Soeren instead of Sören in case you want to avoid non-ASCII. For languages with non-latin alphabet, it is even more complicated. And moreover, the context when it is appropriate or tolerated may differ.</div>
<div><br></div><div>However, it seems that languages usually have some way to write them in ASCII or in English transcription. So, we can use that in source codes. Original names in UTF-8 can be in contributors.csv and in (HTML) documentation for modules which anyway may contain some UTF-8 chars for various reasons.</div>
<div><br></div><div>But anyway, UTF-8 is now everywhere and time to time it is necessary and much easier than various workarounds such as entities in HTML, unicode escape sequences or rewriting readable and standard °C to degC. So, I don't see 7 bit or whatever simplification as advantageous because the problem is complex and you just cannot fit into 7 bit (1).</div>
<div><br></div><div>Are there any disadvantages of using UTF-8?</div><div><br></div><div>Vaclav (Václav Petráš)</div><div><br></div><div><br></div><div>(1) This remembered me about some comment somewhere where the question "How do I use this with Latin2 encoded language?" was answered "Use Latin1." which is of course absurd since Latin1 contains different characters than Latin2 (that's why there are both here). My point is that encoding in something else than unicode/UTF-8 is usually a huge simplification which may destroy the original text.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
regards,<br>
Hamish<br>
<div class=""><div class="h5"><br>
_______________________________________________<br>
grass-dev mailing list<br>
<a href="mailto:grass-dev@lists.osgeo.org">grass-dev@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/mailman/listinfo/grass-dev" target="_blank">http://lists.osgeo.org/mailman/listinfo/grass-dev</a><br>
</div></div></blockquote></div><br></div></div>