[Proj] Unicode

Glynn Clements glynn at gclements.plus.com
Tue Jun 9 09:20:39 PDT 2009


support.mn at elisanet.fi wrote:

> It is clear that sometime in the future Unicode will be
> the de facto world standard character set. Proj-4 can
> live very well without Unicode characters, since it is
> more a definition language. It is like the C programming
> language. The language it self does not include special
> characters, but most modern systems can handle Unicode
> files and characters in comments and strings for example.
> 
> It is more to allow Unicode systems to feed data to proj-4
> but still limit the basic definition character set to the 7-bit
> ascii (or whatever). Proj-4 would just pass through all
> special characters.
> 
> The great idea is that proj-4 would not crash with Unicoded
> characters or files.

PROJ doesn't care about encodings. It doesn't call setlocale() itself,
and (AFAIK) the only encoding-sensitive code is a recent change to
make certain string comparisons case-insensitive. The comparisons are
against literal strings composed entirely of alphanumeric ASCII
characters, so any string containing non-ASCII characters will
automatically fail the comparison regardless of the locale's encoding.

IOW, any encoding which is a superset of ASCII (e.g. UTF-8) will work
fine. Even encodings which aren't a strict superset (e.g. ISO-646-*)
will work. You'll only have trouble if the encoding isn't even
minimally compatible with ASCII, e.g. EBCDIC or UTF-16/UCS-2. But then
neither Unix locales nor Windows codepages use such encodings, for
exactly this reason.

-- 
Glynn Clements <glynn at gclements.plus.com>



More information about the Proj mailing list