Issues with SDE and Unicode

Russell de Grove russell at GOISC.COM
Tue Feb 20 11:48:57 PST 2007


I have map layers in ArcSDE on Sql Server 2005 and I have been trying to 
label features from a field with Unicode data (type nvarchar).

To get around the ""Unknown SDE column type" error I had to add the 
following to the sdeGetRecord method in mapsde.c, in the "switch(itemdefs
[i].sde_type)" block:

#ifdef SE_NSTRING_TYPE
    case SE_NSTRING_TYPE:
      shape->values[i] = (char *)malloc( (itemdefs[i].size + 1) * sizeof
(unsigned short));
      status = SE_stream_get_nstring(sde->stream, 
                                    (short) (i+1), 
                                    (unsigned short *)shape->values[i]);
      if(status == SE_NULL_VALUE)
        ((unsigned short *)shape->values[i])[0] = (unsigned short)0; /* empty 
string */
      else if(status != SE_SUCCESS) {
        sde_error(status, "sdeGetRecord()", "SE_stream_get_nstring()");
        return(MS_FAILURE);
      }
      break;
#endif

So far, so good, but I only see the first character of each label.  If I explicitly 
include a Unicode "preamble", I see two garbage characters followed by the 
first expected characters.  As it happens, my data is in UTF-16 and my 
characters are all ASCII-type characters that use only the low byte.  I believe 
what is causing my problem is the "msGetEncodedString" method in mapgd.c.

char *msGetEncodedString(const char *string, const char *encoding)
{
#ifdef USE_ICONV
  iconv_t cd = NULL;
  char *in, *inp;
  char *outp, *out = NULL;
  size_t len, bufsize, bufleft, status;
  cd = iconv_open("UTF-8", encoding);
  if(cd == (iconv_t)-1) {
    msSetError(MS_IDENTERR, "Encoding not supported by libiconv (%s).", 
               "msGetEncodedString()", encoding);
    return NULL;
  }
  len = strlen(string);

// Problem point: strlen will return the count up to the first null byte, 
so "Shape #0" as Unicode will return 1 for the S stored little-endian, or 3 if a 
Unicode "preamble" is used

  bufsize = len * 4;
  in = strdup(string);
  inp = in;
  out = (char*) malloc(bufsize);
  if(in == NULL || out == NULL){
    msSetError(MS_MEMERR, NULL, "msGetEncodedString()");
    msFree(in);
    iconv_close(cd);
    return NULL;
  }
  strcpy(out, in);
  outp = out;

  bufleft = bufsize;
  status = -1;
  while (len > 0){
    status = iconv(cd, (const char**)&inp, &len, &outp, &bufleft);

// Problem point: since this expects byte pairs, a byte length of 1 or 3 is going 
to cause problems.

    if(status == -1){
      msFree(in);
      msFree(out);
      iconv_close(cd);
      return strdup(string);

// Problem point: since there was a problem, strdup returns the original "string" 
up to the first null byte... so I get "S", possibly with a couple of preceding 
garbage characters if I used a preamble

    }
  }
  out[bufsize - bufleft] = '\0';
  
  msFree(in);
  iconv_close(cd);

  return out;
#else
  msSetError(MS_MISCERR, "Not implemeted since Iconv is not enabled.", 
             "msGetEncodedString()");
  return NULL;
#endif
}

Has anyone else encountered similar problems? Does anyone know how I can 
determine the correct width of characters based on the "encoding" parameter?



More information about the MapServer-users mailing list