[GRASS-dev] [GRASS GIS] #2617: wxgui Raster query redirect to console UnicodeDecodeError
GRASS GIS
trac at osgeo.org
Tue Mar 10 07:14:34 PDT 2015
#2617: wxgui Raster query redirect to console UnicodeDecodeError
-----------------------------+----------------------------------------------
Reporter: marisn | Owner: grass-dev@…
Type: defect | Status: new
Priority: normal | Milestone: 7.0.1
Component: wxGUI | Version: svn-trunk
Keywords: query, encoding | Platform: MSWindows Vista
Cpu: Unspecified |
-----------------------------+----------------------------------------------
Comment(by glynn):
Replying to [comment:2 marisn]:
> The source of problem is r47310 where instead of installing unicode
version of gettext a bytestring version is installed. This should work
fine, but now in every place where a _() call is made, it returns str for
unicode translations. Reverting r47310 fixes this bug (and probably others
too!) without any problems, still I would like to hear Glynn's rationale
why it was necessary in the first place (preferably with patches that
solve _() issue if r47310 is to stay).
The scripting library only uses byte strings, never unicode. Values
returned from _() are typically written to streams (stdout/stderr or
files) or used as command-line arguments. These contexts invariably
require byte strings, so if _() returned a unicode value it will just get
converted to a byte string using the default encoding (not the locale's
encoding or filesystem encoding etc), which is usually ASCII. So prior to
r47310, any attempt by a script to use a translated string while in a non-
English locale was likely to result in the familiar "codec can't encode
character ..." error.
If there's a bug here, it's wxGUI expecting the grass.script library to
cater to it. grass.script doesn't exist for the benefit of wxGUI. If
grass.script isn't suitable for wxGUI (e.g. because of wxPython's use of
unicode), wxGUI should provide its own alternatives, not break
grass.script.
But the real question is: where is that UTF-8 coming from? On Windows,
nothing should ever see UTF-8, as Windows doesn't support UTF-8 as an
actual codepage (cp65001 is a pseudo-codepage which exists to allow
certain functions to use UTF-8; but you can't have a locale which uses
cp65001 as its codepage).
Byte strings which end up in wxGUI should be interpreted as using the
locale's codepage (cp1257 in this case), as should anything converted from
unicode to a byte string by wxGUI. Anything coming from wxPython (e.g. the
contents of a text field) should be unicode values (UTF-16-LE internally).
> Not using unicode version of gettext is really strange, as Slovenian is
the only language NOT using UTF-8 in their PO files and it has seen the
last update in 2005, thus GRASS PO files ARE unicode-ready.
The encoding used in PO files doesn't matter on systems which use GNU
gettext, which will automatically convert from the encoding used in the PO
file to the locale's encoding (so a single PO file can be used for both
e.g. en_GB.utf8 and en_GB.iso88591). In fact, the encoding used in PO
files shouldn't even be visible to applications (unless they're trying to
read the PO file directly rather than using gettext, which would be dumb).
Ideally, PO files should use the locale's legacy encoding (e.g ISO-8859-1
for most of Western Europe). Newer systems will translate that to UTF-8 if
that's what the locale uses; older systems will just copy the data
verbatim, so it needs to use the locale's encoding (which, on older
systems, won't be UTF-8). This has the added advantage of restricting what
goes into those files to characters which can actually be displayed.
--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2617#comment:3>
GRASS GIS <http://grass.osgeo.org>
More information about the grass-dev
mailing list