[Gdal-dev] Re: OGR - Character Encodings
Charlie Savage
cfis at interserv.com
Sat Oct 15 13:33:02 EDT 2005
Thinking about this a bit more, I'm uncomfortable hard-coding postgresql
to LATIN1. Sure that would work for TIGER data, but what if you open a
datasource that is in Latin2, or Latin3?
Just to be clear, what is happening is that libpq is doing automatic
encoding conversions between the client (the ogr library) and the server
(postgresql). Since libq is not told otherwise, on my machine it is
assuming the client encoding is UTF8. That works fine for characters 1
through 127, but fails above that.
On solution that we've discussed is for ogr to use UTF8 internally and
then its up to each datasource to convert to/from UTF8. I would think
that is the better "general" solution but it would a bit of more work.
It would also require two encoding translations - source data source to
UTF8, UTF8 to target (although as time goes on probably everything will
support/use UTF8 so in the end you might not have to do any translations
at all).
Perhaps a simpler approach for the time being is simply for OGR to
inform the destination data source what the encoding of the source data
source is. Thus ogr wouldn't do any encoding translations. For the
postgresql data source that would work fine since it will take care of
converting the encoding as described above - it just needs to know what
the source encoding is.
One way this could be done is implement a "GetDSEncoding" method on
OGRDataSource which would return the encoding of a data source. For
TIGER, that method would return ISO-88590-1. For other datasources, it
could just return NULL or some such thing for "unknown" until otherwise
implemented.
You would then need to add some sort of method on OGRDataSource like
"SetDataEncoding" which would tell the datasource what encoding incoming
data is in.
So something like:
pSrcDS = <open the source data source>
pDstDS = <open the destination data source>
char* pSourceEncoding = pSrcDs.GetDSEncoding;
if (pSourceEncoding )
pDstDs.SetDataEncoding(pSourceEncoding);
Then proceed as normal.
Charlie
Frank Warmerdam wrote:
> On 10/14/05, Charlie Savage <cfis at interserv.com> wrote:
>> This lead two several points. First, is the assumed encoding always
>> ISO-88519? In that case, the Postgresql call above is correct.
>
> Charlie,
>
> As you suspect, OGR is completely encoding-ignorant currently.
> I would encourage you to commit the change seting the encoding to
> LATIN1 for now, as I gather that is a more inclusive character set than
> the default UTF8.
>
>> Second, what happens when you want to load maps for Asian countries? Is
>> that a no-go at the moment?
>
> OGR provides no special support for this. In cases where double
> byte text has been encountered it is treated as if it were single byte
> which will presumably not work well with Postgres.
>
>> Third, if OGR does support encodings, are they any plans to add this
>> functionality?
>
> There are no plans currently to support encoding-awareness in OGR.
>
> /me buries his head in the sand for a couple more years...
>
> Best regards,
> --
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up | Frank Warmerdam, warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush | Geospatial Programmer for Rent
More information about the Gdal-dev
mailing list