Issues with SDE and Unicode

Howard Butler hobu at IASTATE.EDU
Thu Mar 1 21:09:08 EST 2007


Russell,

I have committed a patch in HEAD that is similar, but it just hammers  
the wide character to narrow rather than trying to pass it around.  
Are the data you trying to label actually wide character, or is it  
just an instance of SDE trying to be smart and putting all of your  
string columns in unicode?  I confess to not knowing very much about  
character issues like this, so maybe this approach is the wrong way  
to go.  I left a note to posterity to do something smarter for  
posterity when MapServer becomes more cognizant about these issues.

Howard

> #ifdef SE_NSTRING_TYPE
>             case SE_NSTRING_TYPE:
>                 shape->values[i] = (char *)malloc(itemdefs 
> [i].size*sizeof(char)+1);
>                 wide = (SE_WCHAR *)malloc(itemdefs[i].size*sizeof 
> (SE_WCHAR)+1);
>                 status = SE_stream_get_nstring( sde->stream,
>                                                 (short) (i+1),
>                                                 wide);
>
>                 // hammer the wide character to narrow
>                 // FIXME: do the right thing when MapServer becomes  
> more
>                 // unicode aware.
>                 wcstombs(   shape->values[i],
>                             wide,
>                             strlen(shape->values[i]));
>
>                 if(status == SE_NULL_VALUE)
>                     shape->values[i][0] = '\0'; /* empty string */
>                 else if(status != SE_SUCCESS) {
>                     sde_error(  status,
>                                 "sdeGetRecord()",
>                                 "SE_stream_get_string()");
>                     return(MS_FAILURE);
>                 }
>                 break;
> #endif




On Feb 20, 2007, at 1:48 PM, Russell de Grove wrote:

> I have map layers in ArcSDE on Sql Server 2005 and I have been  
> trying to=20=
>
> label features from a field with Unicode data (type nvarchar).
>
> To get around the ""Unknown SDE column type" error I had to add the=20
> following to the sdeGetRecord method in mapsde.c, in the "switch 
> (itemdefs=
>
> [i].sde_type)" block:
>
> #ifdef SE_NSTRING_TYPE
>     case SE_NSTRING_TYPE:
>       shape->values[i] =3D (char *)malloc( (itemdefs[i].size + 1) *  
> sizeo=
> f
> (unsigned short));
>       status =3D SE_stream_get_nstring(sde->stream,=20
>                                     (short) (i+1),=20
>                                     (unsigned short *)shape->values 
> [i]);
>       if(status =3D=3D SE_NULL_VALUE)
>         ((unsigned short *)shape->values[i])[0] =3D (unsigned short) 
> 0; /*=
>  empty=20
> string */
>       else if(status !=3D SE_SUCCESS) {
>         sde_error(status, "sdeGetRecord()", "SE_stream_get_nstring 
> ()");
>         return(MS_FAILURE);
>       }
>       break;
> #endif
>
> So far, so good, but I only see the first character of each label.   
> If I =
> explicitly=20
> include a Unicode "preamble", I see two garbage characters followed  
> by th=
> e=20
> first expected characters.  As it happens, my data is in UTF-16 and  
> my=20=
>
> characters are all ASCII-type characters that use only the low  
> byte.  I b=
> elieve=20
> what is causing my problem is the "msGetEncodedString" method in  
> mapgd.c.=
>
>
> char *msGetEncodedString(const char *string, const char *encoding)
> {
> #ifdef USE_ICONV
>   iconv_t cd =3D NULL;
>   char *in, *inp;
>   char *outp, *out =3D NULL;
>   size_t len, bufsize, bufleft, status;
>   cd =3D iconv_open("UTF-8", encoding);
>   if(cd =3D=3D (iconv_t)-1) {
>     msSetError(MS_IDENTERR, "Encoding not supported by libiconv (% 
> s).",=20=
>
>                "msGetEncodedString()", encoding);
>     return NULL;
>   }
>   len =3D strlen(string);
>
> // Problem point: strlen will return the count up to the first null  
> byte,=
> =20
> so "Shape #0" as Unicode will return 1 for the S stored little- 
> endian, or=
>  3 if a=20
> Unicode "preamble" is used
>
>   bufsize =3D len * 4;
>   in =3D strdup(string);
>   inp =3D in;
>   out =3D (char*) malloc(bufsize);
>   if(in =3D=3D NULL || out =3D=3D NULL){
>     msSetError(MS_MEMERR, NULL, "msGetEncodedString()");
>     msFree(in);
>     iconv_close(cd);
>     return NULL;
>   }
>   strcpy(out, in);
>   outp =3D out;
>
>   bufleft =3D bufsize;
>   status =3D -1;
>   while (len > 0){
>     status =3D iconv(cd, (const char**)&inp, &len, &outp, &am=
> p;bufleft);
>
> // Problem point: since this expects byte pairs, a byte length of 1  
> or 3 =
> is going=20
> to cause problems.
>
>     if(status =3D=3D -1){
>       msFree(in);
>       msFree(out);
>       iconv_close(cd);
>       return strdup(string);
>
> // Problem point: since there was a problem, strdup returns the  
> original =
> "string"=20
> up to the first null byte... so I get "S", possibly with a couple  
> of prec=
> eding=20
> garbage characters if I used a preamble
>
>     }
>   }
>   out[bufsize - bufleft] =3D '\0';
> =20=20
>   msFree(in);
>   iconv_close(cd);
>
>   return out;
> #else
>   msSetError(MS_MISCERR, "Not implemeted since Iconv is not  
> enabled.",=20=
>
>              "msGetEncodedString()");
>   return NULL;
> #endif
> }
>
> Has anyone else encountered similar problems? Does anyone know how  
> I can=20=
>
> determine the correct width of characters based on the "encoding"  
> paramet=
> er?
>



More information about the mapserver-users mailing list