[GRASS-dev] Re: GRASS startup window patches, map names restrictions

Glynn Clements glynn at gclements.plus.com
Mon Jan 8 09:35:55 EST 2007


Maris Nartiss wrote:

> I can confirm, that using non-latin character with UTF-8 encoding in
> mapset name, will make mapset/GRASS unusable.

Can you provide any more details?

> Map names also can not contain unicode characters.
> Location with unicode chars seem to be OK. Except, it will break
> startup screen in text mode :)
> IMHO other GRASS places are also not unicode aware.

GRASS itself shouldn't need to be "unicode aware"; most code should be
"neutral" regarding encodings, i.e. just treat strings as strings of
bytes, not characters.

Problems are most likely to arise within the UI, where strings of
bytes have to be decoded to strings of characters for rendering (this
applies both to graphical toolkits and curses).

> Probably we should push forward policy, that only latin characters and
> numbers are allowed in map/mapset names? Like "[a-Z] [0-9] _-".

Most ASCII punctuation characters should be allowed except when they
already have specific meanings to GRASS (e.g. = and , are both
significant to the parser, @ is used for map at mapset etc).

Vector maps are problematic due to the decision to use map names as
SQL table names without any translation, meaning that map names are
constrained by SQL syntax.

> Pros for such approach: no need to check all GRASS code to be unicode
> etc. aware.
> Cons: limit's user choice; Unfriendly to non-English speakers.

GRASS C code should just treat all bytes in the range 128-255 as
"letters", subject to the limitation that case-folding won't apply
(i.e. é and É are considered different even in contexts where e and E
are considered equal).

If other libraries (e.g. Tcl/Tk) want to interpret byte strings as
character strings, it's the user's responsibility to use an
appropriate encoding.

Curses is essentially limited to unibyte encodings; in any case,
languages which absolutely require multibyte encodings (CJK) have
problems due to "monospace" (halfwidth/fullwidth) issues.

One other factor to bear in mind is that, on Windows, filenames (and
thus map names, mapset names etc) are interpreted according to the
current codepage, which almost certainly *won't* be UTF-8.

-- 
Glynn Clements <glynn at gclements.plus.com>




More information about the grass-dev mailing list