[postgis-users] character encoding problems
Mark Cave-Ayland
mark.cave-ayland at siriusit.co.uk
Wed Nov 30 03:54:05 PST 2011
On 30/11/11 02:24, Clay, Bruce wrote:
> I trying to learn more about natural language processing and language
> translation
> I have installed the English version of WordNet in Postgres without any
> problems. I downloaded dictionaries from a varity of site such as are
> used in OpenOffice / WinEdt.
> When I try to build a table from several of the different languages I
> get the following error
> ERROR: invalid byte sequence for encoding "UTF8": 0x82
> I checked the encoding and it is indeed set up for Unicode-8. I tried to
> create databases using a variety of other encoding types such as WIN1252
> and others and I got the same error message from all of them except
> SQL_ASCII.
> When I created the database using SQL_ASCII I recieved the warning that
> the database could only store 7 bit data. When I loaded the data in this
> database I did not have any errors and when I look at the data it seems
> to be the same as in the original text file.
> Is there a "proper" encoding type that I should use to load the word
> lists so they can interoperate with the WordNet dataset that happily
> uses the UTF8 encoding?
> Bruce
Hi Bruce,
This isn't strictly a PostGIS question, so you'd be better off
re-posting to the pgsql-general mailing list to get some answers.
However, from what you mention above it seems that the extra
dictionaries you are downloading are not in UTF8 encoding and so may
require conversion upon import.
You can potentially use SQL_ASCII as a workaround, but I would highly
recommend that you don't do this, since then you end up with data in a
mixture of random encodings that you will never be able to output
correctly across all platforms.
ATB,
Mark.
--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
More information about the postgis-users
mailing list