<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 10, 2016 at 9:42 AM, Moritz Lennert <span dir="ltr"><<a href="mailto:mlennert@club.worldonline.be" target="_blank">mlennert@club.worldonline.be</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Maris,<span class=""><br>
<br>
On 07/02/16 11:56, Maris Nartiss wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello devs,<br>
as you might already have noticed, there is a constant stream of<br>
issues containing keywords "encoding" or more often<br>
"UnicodeDecodeError". The main reason behind this is Python 2.x two<br>
types of text strings - byte sequence (one you get with str()) and<br>
Unicode (unicode()). Python 3.x will have only one - Unicode (byte<br>
sequence is not a string any more) thus fixing this frustrating source<br>
of errors.<br>
Moving GRASS Python code to use Unicode internally will make it closer<br>
to Python 3 ready and solve largest part of errors caused by implicit<br>
conversation from encoded text strings to Unicode text strings.<br>
</blockquote>
<br></span>
I would be very happy if we could find a structural solution to this which would avoid having to deal with so many individual errors all the time. </blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The proposal is to make GRASS GIS Python code complaint with Unicode<br>
best practice [1] following principle "decode early, encode late".<br>
Things to change:<br>
1) Any text string entering Python part of code should be decoded at<br>
its entry point and decoded back to byte sequence at its exit point.<br>
It also applies to all calls to GRASS modules passing around text;<br>
2) Replace all text strings with Unicode literals (u'text'). No<br>
exceptions. Note - "text strings" - thus byte sequences should not be<br>
touched;<br>
3) Ensure text file reading / writing is done via codecs.open;<br>
4) Pass only Unicode to Python file handling calls (this is important<br>
for running on MS-Windows);<br>
5) Use Unicode in tests to ensure correctness of code;<br>
6) Introduce information on Unicode usage into Python submitting<br>
guidelines [2],[3].<br>
<br>
Things to change outside of Python code:<br>
1) Store attribute table encoding information along with connection parameters;<br>
2) Ensure storage of correct encoding information on data import and<br>
correct use on export (especially painful for ESRI Shapefiles);<br>
3) Ensure correct encoding information in headers of all PO and XML files.<br>
<br>
Expected problems:<br>
1) When moving to Python 3, all explicit Unicode literal definitions<br>
will need to be removed (u'text' -> 'text');<br>
2) Introduction of "encode early" principle will break all of the<br>
band-aids currently in place - a major breakage of code for a short<br>
time is expected;<br>
3) Guessing correct encoding can be a problem. One of solutions could<br>
be checking early for correctness of system configuration and refusing<br>
to operate on improperly configured systems. Fatal error is better<br>
than silent data corruption (as it is happening at the moment for<br>
certain scenarios).<br>
<br>
</blockquote>
<br></div></div>
I am no expert on this question, and thus do not have a clear opinion on your proposal, except for the fact that I'm very happy that it exists, but here are my intuitive ideas & questions on your topics:</blockquote><div><br></div><div>I don't have a clear opinion either but I hoped Glynn could state his opinion here, because I understood he has a different view on some of these things. AFAIR, one of the problems is possibly different needs of Python scripting library vs. GUI.</div><div><br></div><div>Anna</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Topic to discuss:<br>
1) Implementation plan:<br>
a) should it be done before 7.1?<br>
</blockquote>
<br></span>
I think the sooner, the better, so 7.1 should be our latest milestone (7.0.x should be in 'bugfix only mode).<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
b) should separate bugs be opened for parts of migration?<br>
</blockquote>
<br></span>
To what point can different issues be delimited into +/- autonomous issues ?<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
c) how big / long breakage is acceptable?<br>
</blockquote>
<br></span>
How complete would breakage be: for all encodings, or would LANG=C always work ?<br>
<br>
Is this something which could be done for most part in a concentrated manner during a code sprint (e.g. FOSS4G 2016) ?<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
2) Moving all text in GRASS location to UTF-8 encoding (GRASS 8) thus<br>
pushing the encode/decode "boundary" further. Upside - most of<br>
existing data is UTF-8 ready (parts supporting only ASCII) [4].<br>
</blockquote>
<br></span>
What do you mean with "text in GRASS location" ? How about files on the filesystem that some users might want to access via other tools ? Shouldn't they be in the system-wide encoding ?<br>
<br>
Thank you very much for bringing up this discussion in such a structured manner. I hope that others will show some interest in the matter...<span class="HOEnZb"><font color="#888888"><br>
<br>
Moritz</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
grass-dev mailing list<br>
<a href="mailto:grass-dev@lists.osgeo.org" target="_blank">grass-dev@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/mailman/listinfo/grass-dev" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/grass-dev</a></div></div></blockquote></div><br></div></div>