[postgis-devel] [PostGIS] #808: shp2pgsql and encoding, something must be wrong
PostGIS
trac at osgeo.org
Tue Jan 25 14:44:02 PST 2011
#808: shp2pgsql and encoding, something must be wrong
---------------------+------------------------------------------------------
Reporter: nicklas | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 2.0.0
Component: postgis | Version: trunk
Keywords: |
---------------------+------------------------------------------------------
I think enncoding is very frustrating and hard to understand. It is likely
that this is invalid, but I have twisted it around so many times so now I
think something is wrong.
I have attached a simple dbf-file with one field called "address" and one
row with the text:
"Tårneby in Våler i Solør kommune"
if I first try to use shp2pgsql just ignoring the funny letters, like:
{{{
nicklas at ubuntu64:~/Documents$ /usr/lib/postgresql/8.4/bin/shp2pgsql
test.dbf>test.sql
}}}
I get the error message :
{{{
Unable to convert data value to UTF-8 (iconv reports "Invalid or
incomplete multibyte or wide character"). Current encoding is "UTF-8". Try
"LATIN1" (Western European), or one of the values described at
http://www.postgresql.org/docs/current/static/multibyte.html.
}}}
If I do like this:
{{{
nicklas at ubuntu64:~/Documents$ /usr/lib/postgresql/9.0/bin/shp2pgsql -W
LATIN1 test.dbf>test.sql
}}}
the sql file is produced like this:
{{{
SET CLIENT_ENCODING TO UTF8;
SET STANDARD_CONFORMING_STRINGS TO ON;
BEGIN;
CREATE TABLE "test" (gid serial PRIMARY KEY,
"address" varchar(32));
INSERT INTO "test" ("address") VALUES ('Tårneby in Våler I Solør
kommune');
COMMIT;
}}}
The problem is that psql won't load this sql-file into the database
complaining like this:
{{{
psql:test.sql:6: ERROR: invalid byte sequence for encoding "UTF8":
0xe5726e
}}}
So, what I have to do is changing first row in sql file to client_encoding
LATIN1 instead. The everything works.
According to
[http://postgis.org/documentation/manual-1.5SVN/ch04.html#shp2pgsql_usage
PostGIS doc] shp2pgsql is supposed to convert to UTF8 in the sql file so
psql can load UTF8. I don't think it works that way. shp2pgsql does
nothing about the actual encoding but should tell postgresql about the
original encoding.
The behavior of today makes it impossible to use shp2pgsql-gui since there
is no way to edit the sql-file. You will get one error or another no
matter what encoding you declare.
I have only tried this on trunk version.
What I don't understand is if some local settings in my system makes
things different.
But I think DEPESZ explanation [http://www.depesz.com/index.php/2010/03/07
/error-invalid-byte-sequence-for-encoding/ here] makes sense and my
experience agrees with it.
Thanks
Nicklas
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/808>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-devel
mailing list