[GRASS-dev] Re: GRASS startup window patches, map names restrictions

Maris Nartiss maris.gis at gmail.com
Tue Jan 9 06:49:07 EST 2007


Hi Glynn,

I know, that GRASS SHOULD be encoding etc. neutral, but not all GRASS
code is 100% perfect and it may take too much time to check all
possible breakage places. IMHO it would be more easy to just push some
policy about allowing only latin letters and numbers + some safe
symbols. Atleast until GRASS 7 with new GUI and major code check is
done. This is a good topic for PSC to discuss/wote on ;)

More comments inline.
My system: Gentoo Linux 32bit ~x86, locale="lv_LV.UTF-8", last weeks
GRASS 6.3cvs, tcl/tk 8.4.13

2007/1/8, Glynn Clements <glynn at gclements.plus.com>:
>
> Maris Nartiss wrote:
>
> > I can confirm, that using non-latin character with UTF-8 encoding in
> > mapset name, will make mapset/GRASS unusable.
>
> Can you provide any more details?
I created in startup screen new mapset called "āšņļ", pressed "enter
grass" and gis.m startup failed due to fail of g.region with message:
"Illegal filename. Character <Ä> not allowed."

>
> > Map names also can not contain unicode characters.
> > Location with unicode chars seem to be OK. Except, it will break
> > startup screen in text mode :)
> > IMHO other GRASS places are also not unicode aware.
>
> GRASS itself shouldn't need to be "unicode aware"; most code should be
> "neutral" regarding encodings, i.e. just treat strings as strings of
> bytes, not characters.
>
> Problems are most likely to arise within the UI, where strings of
> bytes have to be decoded to strings of characters for rendering (this
> applies both to graphical toolkits and curses).
Same goes for map names from command line:

GRASS 6.3.cvs (a new Location):~ > r.random.surface output=āšņļ
Illegal filename. Character <� not allowed.
Illegal filename. Character <� not allowed.
KĻŪDA:r.random.surface: map name [āšņļ] not legal for GRASS

GRASS 6.3.cvs (a new Location):~ > v.random output=āšņļ n=20
Illegal filename. Character <� not allowed.
Nepieņemams vektoru kartes nosaukums <āšņļ>. Jāsākas ar burtu.
KĻŪDA:Kartes nosaukums nav SQL savietojams.


> > Probably we should push forward policy, that only latin characters and
> > numbers are allowed in map/mapset names? Like "[a-Z] [0-9] _-".
>
> Most ASCII punctuation characters should be allowed except when they
> already have specific meanings to GRASS (e.g. = and , are both
> significant to the parser, @ is used for map at mapset etc).
Space, "{" and "}" should be banned, as creating location "a new
location" and mapset "a new mapset" or "{ a mapset }" will result in
g.region fail:
"Illegal filename. Character < > not allowed."
Entering into location with space in name in text mode is impossible,
as part before space is only accepted.

Same applys to chars with special meanings in shell "$" "#" as they
may be incorrectly escaped/enclosed in quotes in scripts or make
scripting from console harder.

>
> Vector maps are problematic due to the decision to use map names as
> SQL table names without any translation, meaning that map names are
> constrained by SQL syntax.
>
> > Pros for such approach: no need to check all GRASS code to be unicode
> > etc. aware.
> > Cons: limit's user choice; Unfriendly to non-English speakers.
>
> GRASS C code should just treat all bytes in the range 128-255 as
> "letters", subject to the limitation that case-folding won't apply
> (i.e. é and É are considered different even in contexts where e and E
> are considered equal).
>
> If other libraries (e.g. Tcl/Tk) want to interpret byte strings as
> character strings, it's the user's responsibility to use an
> appropriate encoding.
>
> Curses is essentially limited to unibyte encodings; in any case,
> languages which absolutely require multibyte encodings (CJK) have
> problems due to "monospace" (halfwidth/fullwidth) issues.
>
> One other factor to bear in mind is that, on Windows, filenames (and
> thus map names, mapset names etc) are interpreted according to the
> current codepage, which almost certainly *won't* be UTF-8.
>
> --
> Glynn Clements <glynn at gclements.plus.com>
>

As GRASS is more widely addopted in many countries, it should clrearly
state it's position to UTF-8 and symbols in mapset/map names.

Just trying to make GRASS better,
Maris.




More information about the grass-dev mailing list