[GRASS-dev] Re: [GRASS GIS] #1193: Python Menu: Japanese (double byte character) in menu may cause parser error.

GRASS GIS trac at osgeo.org
Sun Oct 10 23:30:23 EDT 2010


#1193: Python Menu: Japanese (double byte character) in menu may cause parser
error.
-------------------------+--------------------------------------------------
 Reporter:  naokiueda    |       Owner:  grass-dev@…              
     Type:  defect       |      Status:  new                      
 Priority:  major        |   Milestone:  6.4.1                    
Component:  Python       |     Version:  6.4.0                    
 Keywords:               |    Platform:  MSWindows 7              
      Cpu:  Unspecified  |  
-------------------------+--------------------------------------------------

Comment(by glynn):

 Replying to [comment:1 neteler]:

 > I have tried on Linux and I could launch r.reclass in Japanese without
 problems. Perhaps it is
 > a Windows-only problem.

 AFAICT, it's a problem with Shift-JIS (cp932), which isn't compatible with
 ASCII. Unix systems use EUC-JP, which doesn't have this problem.

 Shift-JIS is a multi-byte encoding. Non-ASCII characters have a first byte
 with the top bit set, but the second byte can be any value >= 64. While
 this excludes the digits and most of the punctuation characters, it
 includes `[\]^_{|}~`.

 This makes it incompatible with any code which parses a stream of bytes
 without reference to the encoding, as e.g. '\' (0x5c) might be an ASCII
 '\' or it might be the second byte of a JISX0208 character; you can't tell
 without tracking the shift state.

 Unfortunately, the only Japanese encoding which is supported by Windows'
 codepage-based API is Shift-JIS (actually, codepage 932, which is Shift-
 JIS plus the usual Microsoft-specific extensions). There is no UTF-8
 codepage (cp 65001 is UTF-8, but it can't be used as a normal codepage).

 I don't think that there's any solution to this, other than "don't use
 kanji (or hiragana or full-width katakana) in command lines". GRASS is
 stuck using the codepage-based API (unless someone wants to implement
 UTF-8 equivalents of all of the ANSI C and POSIX functions, and change all
 of GRASS to use them), and expecting every function which deals with char*
 to decode it according to the current locale isn't feasible.

-- 
Ticket URL: <http://trac.osgeo.org/grass/ticket/1193#comment:2>
GRASS GIS <http://grass.osgeo.org>



More information about the grass-dev mailing list