[Gdal-dev] OGR - Character Encodings
Charlie Savage
cfis at interserv.com
Fri Oct 14 12:24:49 EDT 2005
If you try to load the Tiger data for Guam to Postgresql, the
entitynames table will fail. The first line in this file is:
C0604 2000 75O 36163 Hagåtña, GU
Tiger data is in ISO-88519 (LATIN1), while a default connection to
Postgresql seems to be UTF8 (at least on my machine, WindowsXP). The
result is this error message from Postgresql:
Invalid UNICODE byte sequence detected near byte
The same problem occurs for a lot of the data for Puerto Rico, and one
file in Wisconsin.
The solution is both easy and hard. The easy bit is that after
connecting to Postgresql, OGR should call the PQsetClientEncoding
function. In this case:
char* encoding = "LATIN1";
if (PQsetClientEncoding(hPGConn, encoding) == -1)
{
CPLError( CE_Failure, CPLE_AppDefined,
"PQsetClientEncoding failed. Encoding: %s", encoding);
PQfinish(hPGConn);
hPGConn = NULL;
return FALSE;
}
The hard bit is what character encoding to specify. The best solution
is that OGR would specify the encoding of each data souce it opens.
Unfortunately, as far as I can see, OGR has no support for different
character encodings (either telling you what they are, or working with
multi-byte encodings).
This lead two several points. First, is the assumed encoding always
ISO-88519? In that case, the Postgresql call above is correct.
Second, what happens when you want to load maps for Asian countries? Is
that a no-go at the moment?
Third, if OGR does support encodings, are they any plans to add this
functionality?
Thanks,
Charlie
More information about the Gdal-dev
mailing list