[GRASS-dev] [GRASS GIS] #2525: Unable to open sqlite database if path contains non-latin letters

GRASS GIS trac at osgeo.org
Sat Jan 3 17:57:18 PST 2015

#2525: Unable to open sqlite database if path contains non-latin letters
 Reporter:  marisn       |       Owner:  grass-dev@…              
     Type:  defect       |      Status:  new                      
 Priority:  major        |   Milestone:  7.0.0                    
Component:  wxGUI        |     Version:  svn-releasebranch70      
 Keywords:               |    Platform:  MSWindows Vista          
      Cpu:  Unspecified  |  

Comment(by glynn):

 Replying to [ticket:2525 marisn]:

 > Note: could CommandLineToArgvW be helpful?

 What we need is the reverse: something which reliably converts argv[] to a
 command string.

 We actually have one of those (make_command_line() in lib/gis/spawn.c),
 and Python also has one (list2cmdline() in the subprocess module). The
 problem is that both of these only reverse the parsing which is done by
 the executable itself, not that done by the shell. The shell's parsing
 rules are even less well documented than those of the executable, and even
 less sane.

 The other issue is that the shell uses two different encodings
 (codepages): "ANSI" and "OEM". Most of the time this doesn't matter; you
 can just pass the byte strings straight through. But there are cases (such
 as using the FOR command with backticks to take process output and use it
 as an argument) where this doesn't work, and any character which doesn't
 have the same codepoint in both encodings will cause problems (problems
 which can't realistically be solved).

 As for filenames, the main issues are

  1. If you use byte strings (i.e. char*) (e.g. fopen()), you can't access
 any file whose name isn't representable in the current codepage. Those
 files effectively don't exist in the char* world.

  2. The only supported encoding for Japanese is Shift-JIS (cp932), which
 has the unfortunate feature of not being entirely compatible with ASCII.
 Specifically, 0x5c is used both for the directory separator (normally
 backslash, but actually prints as a yen (¥) sign in Japanese locales) and
 as the second byte of some multi-byte sequences. Meaning that any code
 which tries to parse filenames as byte strings with 0x5c as a directory
 separator will often fail on Japanese filenames.

 Neither of these have any simple solution (not even unreliable "hacks").
 The only effective solution is to use the Unicode (i.e. wchar_t*) API.

 In practical terms, that would mean writing a compatibility layer which
 re-implements all of the standard ANSI C and POSIX filesystem calls,
 taking UTF-8 char* arguments, converting to UTF-16 wchar_t*, then using
 the Windows-specific wchar_t* functions. Anything which uses third-party
 library functions which take filenames as char* won't work.

 We'd also need custom startup code which used main16(int argc, wchar_t
 **argv) as the entry point, converted all arguments to UTF-8, then called
 main(). We'd still have issues with reading filenames from files, stdin,
 or output from child processes, as these would either have to be in UTF-8
 or would need to be converted to UTF-8 (which means that we'd need to know
 the encoding).

Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:1>
GRASS GIS <http://grass.osgeo.org>

More information about the grass-dev mailing list