[GRASS-dev] [GRASS GIS] #2617: wxgui Raster query redirect to console UnicodeDecodeError

GRASS GIS trac at osgeo.org
Tue Mar 10 07:14:34 PDT 2015


#2617: wxgui Raster query redirect to console UnicodeDecodeError
-----------------------------+----------------------------------------------
 Reporter:  marisn           |       Owner:  grass-dev@…              
     Type:  defect           |      Status:  new                      
 Priority:  normal           |   Milestone:  7.0.1                    
Component:  wxGUI            |     Version:  svn-trunk                
 Keywords:  query, encoding  |    Platform:  MSWindows Vista          
      Cpu:  Unspecified      |  
-----------------------------+----------------------------------------------

Comment(by glynn):

 Replying to [comment:2 marisn]:

 > The source of problem is r47310 where instead of installing unicode
 version of gettext a bytestring version is installed. This should work
 fine, but now in every place where a _() call is made, it returns str for
 unicode translations. Reverting r47310 fixes this bug (and probably others
 too!) without any problems, still I would like to hear Glynn's rationale
 why it was necessary in the first place (preferably with patches that
 solve _() issue if r47310 is to stay).

 The scripting library only uses byte strings, never unicode. Values
 returned from _() are typically written to streams (stdout/stderr or
 files) or used as command-line arguments. These contexts invariably
 require byte strings, so if _() returned a unicode value it will just get
 converted to a byte string using the default encoding (not the locale's
 encoding or filesystem encoding etc), which is usually ASCII. So prior to
 r47310, any attempt by a script to use a translated string while in a non-
 English locale was likely to result in the familiar "codec can't encode
 character ..." error.

 If there's a bug here, it's wxGUI expecting the grass.script library to
 cater to it. grass.script doesn't exist for the benefit of wxGUI. If
 grass.script isn't suitable for wxGUI (e.g. because of wxPython's use of
 unicode), wxGUI should provide its own alternatives, not break
 grass.script.

 But the real question is: where is that UTF-8 coming from? On Windows,
 nothing should ever see UTF-8, as Windows doesn't support UTF-8 as an
 actual codepage (cp65001 is a pseudo-codepage which exists to allow
 certain functions to use UTF-8; but you can't have a locale which uses
 cp65001 as its codepage).

 Byte strings which end up in wxGUI should be interpreted as using the
 locale's codepage (cp1257 in this case), as should anything converted from
 unicode to a byte string by wxGUI. Anything coming from wxPython (e.g. the
 contents of a text field) should be unicode values (UTF-16-LE internally).

 > Not using unicode version of gettext is really strange, as Slovenian is
 the only language NOT using UTF-8 in their PO files and it has seen the
 last update in 2005, thus GRASS PO files ARE unicode-ready.

 The encoding used in PO files doesn't matter on systems which use GNU
 gettext, which will automatically convert from the encoding used in the PO
 file to the locale's encoding (so a single PO file can be used for both
 e.g. en_GB.utf8 and en_GB.iso88591). In fact, the encoding used in PO
 files shouldn't even be visible to applications (unless they're trying to
 read the PO file directly rather than using gettext, which would be dumb).

 Ideally, PO files should use the locale's legacy encoding (e.g ISO-8859-1
 for most of Western Europe). Newer systems will translate that to UTF-8 if
 that's what the locale uses; older systems will just copy the data
 verbatim, so it needs to use the locale's encoding (which, on older
 systems, won't be UTF-8). This has the added advantage of restricting what
 goes into those files to characters which can actually be displayed.

-- 
Ticket URL: <http://trac.osgeo.org/grass/ticket/2617#comment:3>
GRASS GIS <http://grass.osgeo.org>



More information about the grass-dev mailing list