[GRASS-dev]
Re: GRASS startup window patches, map names restrictions
Glynn Clements
glynn at gclements.plus.com
Tue Jan 9 16:41:42 EST 2007
Maris Nartiss wrote:
> I know, that GRASS SHOULD be encoding etc. neutral, but not all GRASS
> code is 100% perfect and it may take too much time to check all
> possible breakage places. IMHO it would be more easy to just push some
> policy about allowing only latin letters and numbers + some safe
> symbols.
It might be worthwhile informing users that non-ASCII characters are
problematic, but so far as developers are concerned, problems with
non-ASCII characters should be fixed rather than worked around
wherever practical, IMHO.
Certainly:
1. Nothing should actively prohibit the user from using non-ASCII
characters on the basis that other parts of GRASS *might* not handle
them correctly.
2. Advising users of difficulties with non-ASCII characters shouldn't
be considered a substitute for making code 8-bit clean.
> > > I can confirm, that using non-latin character with UTF-8 encoding in
> > > mapset name, will make mapset/GRASS unusable.
> >
> > Can you provide any more details?
>
> I created in startup screen new mapset called "-Dà-B¹-Dñ¶", pressed "enter-A
> grass" and gis.m startup failed due to fail of g.region with message:
> "Illegal filename. Character <Ä> not allowed."
Right; that comes from the following in G_legal_filename():
if (*s == '/' || *s == '"' || *s == '\'' || *s <= ' ' ||
*s == '@' || *s == ',' || *s == '=' || *s == '*' || *s > 0176) {
fprintf(stderr, _("Illegal filename. Character <%c> not allowed.\n"), *s);
Note that it's probably *not* the "*s > 0176" test which is triggered,
but the "*s <= ' '" test, as "s" has type "char *" and "char" with no
"signed" or "unsigned" qualifier is normally signed.
ANSI C states that it's implementation dependent whether "char" is
signed or unsigned; gcc can be forced to use a particular
interpretation with -funsigned-char or -fsigned-char.
That issue is probably more significant than any actual encoding
issues; G_legal_filename() probably isn't the only place that
overlooked the fact that "char" normally ranges from -128 to 127, not
0 to 255.
I strongly suggest removing that test from development versions, as it
prevents us from testing anything related to filename encoding issues.
> > > Probably we should push forward policy, that only latin characters and
> > > numbers are allowed in map/mapset names? Like "[a-Z] [0-9] _-".
> >
> > Most ASCII punctuation characters should be allowed except when they
> > already have specific meanings to GRASS (e.g. = and , are both
> > significant to the parser, @ is used for map at mapset etc).
>
> Space, "{" and "}" should be banned, as creating location "a new
> location" and mapset "a new mapset" or "{ a mapset }" will result in
> g.region fail:
> "Illegal filename. Character < > not allowed."
> Entering into location with space in name in text mode is impossible,
> as part before space is only accepted.
The names of locations, mapsets and maps shouldn't contain spaces, but
the database directory might (on Windows, the user might only be able
to create files beneath e.g. "C:\Documents and Settings"), as might
the names of files being imported or exported.
> Same applys to chars with special meanings in shell "$" "#" as they
> may be incorrectly escaped/enclosed in quotes in scripts or make
> scripting from console harder.
Those characters should be permitted. It's up to the user whether they
consider issues related to Bourne shell syntax to be relevant. If they
aren't typing map names into a shell, it isn't an issue. Shell scripts
should work with whatever names they're given. Which isn't hard; you
just need to remember to quote variable substitutions, i.e. "$foo"
(with the quotes) rather than just $foo.
Single and double quotes are currently prohibited, although that isn't
strictly necessary. Both are slightly problematic in r.mapcalc (a map
whose name includes both a single quote and a double quote cannot be
entered), but even that could be fixed if it was an issue. The single
quote is problematic mostly because of code such as:
sprintf(buf, "g.foo map='%s'", mapname);
system(buf);
If we were to allow single quotes, every such use of system() would
have to be fixed. OTOH, using system() is bad enough in and of itself;
hopefully we will reduce its use in due course (G_spawn() doesn't have
this problem, as it doesn't use /bin/sh).
> As GRASS is more widely addopted in many countries, it should clrearly
> state it's position to UTF-8 and symbols in mapset/map names.
There are three distinct issues here:
1. ASCII punctuation (or characters which are otherwise "significant").
2. Non-ASCII (8-bit) characters.
3. Multibyte encodings.
#1 is problematic due to lots of individual cases, i.e. specific
characters having a specific purpose in specific cases.
#2 is problematic due to signed/unsigned issues in a few places, and
due to third-party code which assumes specific encodings.
#3 is problematic simply because multibyte encodings are harder to
deal with than unibyte encodings, and most existing code assumes
unibyte encodings (except for code related to FreeType fonts in the
display system, which explicitly uses iconv to convert from a
user-specified encoding to unicode).
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list