Issues with SDE and Unicode

Umberto Nicoletti umberto.nicoletti at GMAIL.COM
Wed Feb 21 01:39:18 PST 2007


Are you running mapserver on Windows or Linux?
If are on Linux try to set the LC_ALL environment variable to
something reasonable for your setup, like en_US.UTF-16 or en_US.UTF-8
and see if things improve. On Windows do something similar on the
regional settings in control panel.

On linux you can get more detail by looking at the locale man page.

HTH,
Umberto


On 2/20/07, Russell de Grove <russell at goisc.com> wrote:
> I have map layers in ArcSDE on Sql Server 2005 and I have been trying to
> label features from a field with Unicode data (type nvarchar).
>
> To get around the ""Unknown SDE column type" error I had to add the
> following to the sdeGetRecord method in mapsde.c, in the "switch(itemdefs
> [i].sde_type)" block:
>
> #ifdef SE_NSTRING_TYPE
>     case SE_NSTRING_TYPE:
>       shape->values[i] = (char *)malloc( (itemdefs[i].size + 1) * sizeof
> (unsigned short));
>       status = SE_stream_get_nstring(sde->stream,
>                                     (short) (i+1),
>                                     (unsigned short *)shape->values[i]);
>       if(status == SE_NULL_VALUE)
>         ((unsigned short *)shape->values[i])[0] = (unsigned short)0; /* empty
> string */
>       else if(status != SE_SUCCESS) {
>         sde_error(status, "sdeGetRecord()", "SE_stream_get_nstring()");
>         return(MS_FAILURE);
>       }
>       break;
> #endif
>
> So far, so good, but I only see the first character of each label.  If I explicitly
> include a Unicode "preamble", I see two garbage characters followed by the
> first expected characters.  As it happens, my data is in UTF-16 and my
> characters are all ASCII-type characters that use only the low byte.  I believe
> what is causing my problem is the "msGetEncodedString" method in mapgd.c.
>
> char *msGetEncodedString(const char *string, const char *encoding)
> {
> #ifdef USE_ICONV
>   iconv_t cd = NULL;
>   char *in, *inp;
>   char *outp, *out = NULL;
>   size_t len, bufsize, bufleft, status;
>   cd = iconv_open("UTF-8", encoding);
>   if(cd == (iconv_t)-1) {
>     msSetError(MS_IDENTERR, "Encoding not supported by libiconv (%s).",
>                "msGetEncodedString()", encoding);
>     return NULL;
>   }
>   len = strlen(string);
>
> // Problem point: strlen will return the count up to the first null byte,
> so "Shape #0" as Unicode will return 1 for the S stored little-endian, or 3 if a
> Unicode "preamble" is used
>
>   bufsize = len * 4;
>   in = strdup(string);
>   inp = in;
>   out = (char*) malloc(bufsize);
>   if(in == NULL || out == NULL){
>     msSetError(MS_MEMERR, NULL, "msGetEncodedString()");
>     msFree(in);
>     iconv_close(cd);
>     return NULL;
>   }
>   strcpy(out, in);
>   outp = out;
>
>   bufleft = bufsize;
>   status = -1;
>   while (len > 0){
>     status = iconv(cd, (const char**)&inp, &len, &outp, &bufleft);
>
> // Problem point: since this expects byte pairs, a byte length of 1 or 3 is going
> to cause problems.
>
>     if(status == -1){
>       msFree(in);
>       msFree(out);
>       iconv_close(cd);
>       return strdup(string);
>
> // Problem point: since there was a problem, strdup returns the original "string"
> up to the first null byte... so I get "S", possibly with a couple of preceding
> garbage characters if I used a preamble
>
>     }
>   }
>   out[bufsize - bufleft] = '\0';
>
>   msFree(in);
>   iconv_close(cd);
>
>   return out;
> #else
>   msSetError(MS_MISCERR, "Not implemeted since Iconv is not enabled.",
>              "msGetEncodedString()");
>   return NULL;
> #endif
> }
>
> Has anyone else encountered similar problems? Does anyone know how I can
> determine the correct width of characters based on the "encoding" parameter?
>



More information about the MapServer-users mailing list