[Gdal-dev] Re: OGR - Character Encodings

Frank Warmerdam warmerdam at pobox.com
Fri Oct 14 15:12:23 EDT 2005


On 10/14/05, Charlie Savage <cfis at interserv.com> wrote:
> > As you suspect, OGR is completely encoding-ignorant currently.
> > I would encourage you to commit the change seting the encoding to
> > LATIN1 for now, as I gather that is a more inclusive character set than
> > the default UTF8.
>
> Actually, I would say UTF8 is more inclusive since the lower 128 match
> ASCII, but after that you can encode anything you want (of course it can
> take 2 to 6 bytes for non ASCII characters).

Charlie,

Good point.  But what happens to "normal" LATIN1 characters in the
128 to 255 range with UTF8?

> I'm more worried about all the string handling code that loops over
> pointers to chars* - I think there are places where the implicit
> assumption is that characters are always 1 byte long.  Thus I would
> guess any such string would get mangled long before you tried to post
> them to Postgresql, or any other data source for that matter.

As long as string values are not being parsed, and as long as UTF8
or other encodings don't have embedded zero-bytes the strings
should get carried around everywhere just fine.  I have long depended
on this as a way around being encoding aware.  Though OGR isn't
providing any help in interpreting encoding.

> > There are no plans currently to support encoding-awareness in OGR.
> >
> > /me buries his head in the sand for a couple more years...
>
> LOL.  I wonder how much work it would really be though.  Maybe one
> approach would be to update the core OGR string handling code to be
> encoding aware, thereby limiting the scope in the first go.
>
> Then you could update various drivers, as the need arose, to be encoding
> aware.  And probably there are a number of datasources which just assume
> LATIN1 anyways, so you wouldn't have to touch them (out of curiosity,
> does Shape support encodings?).

I don't know if Shape supports encoding.  There is an encoding byte
in the .dbf headers but the ESRI Shapefile spec does not address any
details about the DBF format and features in effect.

> Something to ponder.

... or I could pretend not to have heard anything about encodings ...

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent




More information about the Gdal-dev mailing list