[fdo-announce] Important note regarding MySQL provider changes in
trunk (character sets).
Brent Robinson
brent.robinson at autodesk.com
Tue Jan 16 11:53:11 EST 2007
Hi,
Some significant changes were made today, in the open-source trunk
repository, for the MySQL provider. The main change was to set the
connection character set to UTF8. Previously, the character set was
Latin1 (the default), but the MySQL provider was passing UTF8-encoded
strings to MySQL. This caused a few problems when names and strings
contained non-ASCII7 characters:
- After creating a datastore with non-ASCII7 name, it was
not possible to open it.
- String data inserted into tables, with double byte
character sets, could not be read properly
- Any string data inserted by FDO, into MySQL tables, was
double UTF8 encoded, making it difficult for 3rd party applications to
read it.
The changes have the following impacts on the MySQL Provider:
- minimum MySQL version now 5.0.22
- when you create a new datastore, through FDO, the character set and
collation are the MySQL server defaults. Previously, these were set to
"latin1" and "latin1_bin" respectively.
- the current open-source MySQL Provider and previous versions are
no longer compatible for non-ASCII7 users.
MySQL Version:
The changes rely on some collation fixes made to MySQL somewhere between
5.0.15 and 5.0.22. Therefore, it is recommend that you upgrade to MySQL
5.0.22 before using the latest open-source MySQL Provider.
Datastore character set:
Previously, all datastores created by FDO had default character set
Latin1 and default collation Latin1_bin. With this change, the character
set and collation default to that of the MySQL server. To find out what
these settings currently are, connect directly to MySQL and execute the
following:
show variables like 'character_set_server';
show variables like 'collation_server';
You are now responsible for using a server character set and
collation that matches your data. For English and Western European
customers, the Latin1 character set is fine. Other customers will need
to use the UTF8 character set, or a language specific one such as CP932
(Japanese). When in doubt, UTF8 is the best one to use since it is a
Unicode format.
The server collation determines whether your data is treated as
case-sensitive or case-insensitive. This includes column contents, not
just table and column names. Note that the default is usually a
case-insensitive one.
Server character set and collation can be set by stopping the MySQL
service, modifying my.ini, and restarting the service. In my.ini, look
for any:
default-character-set=
settings, and change these to the desired character set. The collation
can be set by adding a default-collation setting to the [mysqld]
section, e.g:
default-collation=utf8_bin
to use case-insensitive UTF8 collation.
Please refer to the MySQL documentation at www.mysql.com for more
information on character sets and collations.
Provider and Datastore Compatibility:
Previous versions of the MySQL provider inserted strings, via Feature
Commands, that were UTF8-encoded once too many. This change fixes this
problem. However, this means that pre-existing MySQL datastores can be
properly accessed by both current and previous MySQL providers only if
all of the following is true:
- none of the character data in the datastore contains any
non-ASCII7 characters
- the default character set for the datastore is latin1
- the default collation for the datastore is latin1_bin
- the server character set is latin1
- the server collation is latin1_bin.
For all other cases, pre-existing datastores must be upgraded before
they are accessed by the current MySQL Provider. This can be done by
doing the following:
- copy the datastore to SDF, using previous MySQL Provider.
- create a new MySQL datastore, using current MySQL Provider
- copy the above SDF data to the new datastore, using current
MySQL Provider.
Note that the above is not an in-place conversion. You must create a new
datastore for use by the current MySQL Provider.
Note also that datastores created by the current MySQL Provider must not
be accessed by previous provider versions.
Just a bit more background on the above info. ASCII7 characters (
0x00-0x7f) have the same values for their representations in various
character sets. Therefore the extra UTF8 encoding, done by previous
MySQL Providers, on strings to insert, had no effect. For this reason,
there are no compatibility problems for datastores with only latin1_bin
tables, storing only ASCII7 characters. However, for other characters,
their values are different for different character sets so the extra
UTF8 encoding does change their values, thus leading to the above
compatibility problems.
Brent Robinson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/fdo-announce/attachments/20070116/ef770a4c/attachment.html
More information about the fdo-announce
mailing list