[GRASS-dev] [GRASS GIS] #2525: Unable to open sqlite database if path contains non-latin letters
GRASS GIS
trac at osgeo.org
Sat Jan 3 17:57:18 PST 2015
#2525: Unable to open sqlite database if path contains non-latin letters
-------------------------+--------------------------------------------------
Reporter: marisn | Owner: grass-dev@…
Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: wxGUI | Version: svn-releasebranch70
Keywords: | Platform: MSWindows Vista
Cpu: Unspecified |
-------------------------+--------------------------------------------------
Comment(by glynn):
Replying to [ticket:2525 marisn]:
> Note: could CommandLineToArgvW be helpful?
What we need is the reverse: something which reliably converts argv[] to a
command string.
We actually have one of those (make_command_line() in lib/gis/spawn.c),
and Python also has one (list2cmdline() in the subprocess module). The
problem is that both of these only reverse the parsing which is done by
the executable itself, not that done by the shell. The shell's parsing
rules are even less well documented than those of the executable, and even
less sane.
The other issue is that the shell uses two different encodings
(codepages): "ANSI" and "OEM". Most of the time this doesn't matter; you
can just pass the byte strings straight through. But there are cases (such
as using the FOR command with backticks to take process output and use it
as an argument) where this doesn't work, and any character which doesn't
have the same codepoint in both encodings will cause problems (problems
which can't realistically be solved).
As for filenames, the main issues are
1. If you use byte strings (i.e. char*) (e.g. fopen()), you can't access
any file whose name isn't representable in the current codepage. Those
files effectively don't exist in the char* world.
2. The only supported encoding for Japanese is Shift-JIS (cp932), which
has the unfortunate feature of not being entirely compatible with ASCII.
Specifically, 0x5c is used both for the directory separator (normally
backslash, but actually prints as a yen (¥) sign in Japanese locales) and
as the second byte of some multi-byte sequences. Meaning that any code
which tries to parse filenames as byte strings with 0x5c as a directory
separator will often fail on Japanese filenames.
Neither of these have any simple solution (not even unreliable "hacks").
The only effective solution is to use the Unicode (i.e. wchar_t*) API.
In practical terms, that would mean writing a compatibility layer which
re-implements all of the standard ANSI C and POSIX filesystem calls,
taking UTF-8 char* arguments, converting to UTF-16 wchar_t*, then using
the Windows-specific wchar_t* functions. Anything which uses third-party
library functions which take filenames as char* won't work.
We'd also need custom startup code which used main16(int argc, wchar_t
**argv) as the entry point, converted all arguments to UTF-8, then called
main(). We'd still have issues with reading filenames from files, stdin,
or output from child processes, as these would either have to be in UTF-8
or would need to be converted to UTF-8 (which means that we'd need to know
the encoding).
--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2525#comment:1>
GRASS GIS <http://grass.osgeo.org>
More information about the grass-dev
mailing list