[Proj] Unicode
Glynn Clements
glynn at gclements.plus.com
Tue Jun 9 09:20:39 PDT 2009
support.mn at elisanet.fi wrote:
> It is clear that sometime in the future Unicode will be
> the de facto world standard character set. Proj-4 can
> live very well without Unicode characters, since it is
> more a definition language. It is like the C programming
> language. The language it self does not include special
> characters, but most modern systems can handle Unicode
> files and characters in comments and strings for example.
>
> It is more to allow Unicode systems to feed data to proj-4
> but still limit the basic definition character set to the 7-bit
> ascii (or whatever). Proj-4 would just pass through all
> special characters.
>
> The great idea is that proj-4 would not crash with Unicoded
> characters or files.
PROJ doesn't care about encodings. It doesn't call setlocale() itself,
and (AFAIK) the only encoding-sensitive code is a recent change to
make certain string comparisons case-insensitive. The comparisons are
against literal strings composed entirely of alphanumeric ASCII
characters, so any string containing non-ASCII characters will
automatically fail the comparison regardless of the locale's encoding.
IOW, any encoding which is a superset of ASCII (e.g. UTF-8) will work
fine. Even encodings which aren't a strict superset (e.g. ISO-646-*)
will work. You'll only have trouble if the encoding isn't even
minimally compatible with ASCII, e.g. EBCDIC or UTF-16/UCS-2. But then
neither Unix locales nor Windows codepages use such encodings, for
exactly this reason.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the Proj
mailing list