[Proj] Unicode

Mon Jun 8 11:52:02 PDT 2009

2009/6/8 Gerald I. Evenden <geraldi.evenden at gmail.com>

>
> Besides, why take up vast quantities of bytes with 16 bit code where
> special
> inline doublet or triplet escape coding like LaTex uses can do the job with
> 7
> bit ASCII with ease.
>

That's what UTF-8 is for.

Markus Kuhn gives all the details in
http://www.cl.cam.ac.uk/~mgk25/unicode.html - from where I quote the
following item of interest:

C support for Unicode and UTF-8

Starting with GNU glibc 2.2, the type wchar_t is officially intended to be
used only for 32-bit ISO 10646 values, independent of the currently used
locale. This is signalled to applications by the definition of the
__STDC_ISO_10646__ macro as required by ISO C99. The ISO C multi-byte
conversion functions (mbsrtowcs(), wcsrtombs(), etc.) are fully implemented
in glibc 2.2 or higher and can be used to convert between wchar_t and any
locale-dependent multibyte encoding, including UTF-8, ISO 8859-1, etc.

For example, you can write

  #include <stdio.h>
  #include <locale.h>

  int main()
  {
    if (!setlocale(LC_CTYPE, "")) {
      fprintf(stderr, "Can't set the specified locale! "
              "Check LANG, LC_CTYPE, LC_ALL.\n");
      return 1;
    }
    printf("%ls\n", L"Schöne Grüße");
    return 0;
  }

Call this program with the locale setting LANG=de_DE and the output will be
in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output will be in
UTF-8. The %ls format specifier in printf calls wcsrtombs in order to
convert the wide character argument string into the locale-dependent
multi-byte encoding.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/proj/attachments/20090608/6f04b608/attachment.html>