[postgis-devel] [PostGIS] #808: shp2pgsql and encoding, something must be wrong

PostGIS trac at osgeo.org
Tue Jan 25 14:44:02 PST 2011


#808: shp2pgsql and encoding, something must be wrong
---------------------+------------------------------------------------------
 Reporter:  nicklas  |       Owner:  pramsey      
     Type:  defect   |      Status:  new          
 Priority:  medium   |   Milestone:  PostGIS 2.0.0
Component:  postgis  |     Version:  trunk        
 Keywords:           |  
---------------------+------------------------------------------------------
 I think enncoding is very frustrating and hard to understand. It is likely
 that this is invalid, but I have twisted it around so many times so now I
 think something is wrong.

 I have attached a simple dbf-file with one field called "address" and one
 row with the text:
 "Tårneby in Våler i  Solør kommune"

 if I first try to use shp2pgsql just ignoring the funny letters, like:

 {{{
 nicklas at ubuntu64:~/Documents$ /usr/lib/postgresql/8.4/bin/shp2pgsql
 test.dbf>test.sql
 }}}
 I get the error message :

 {{{
 Unable to convert data value to UTF-8 (iconv reports "Invalid or
 incomplete multibyte or wide character"). Current encoding is "UTF-8". Try
 "LATIN1" (Western European), or one of the values described at
 http://www.postgresql.org/docs/current/static/multibyte.html.
 }}}

 If I do like this:

 {{{
 nicklas at ubuntu64:~/Documents$ /usr/lib/postgresql/9.0/bin/shp2pgsql -W
 LATIN1 test.dbf>test.sql
 }}}
 the sql file is produced like this:

 {{{
 SET CLIENT_ENCODING TO UTF8;
 SET STANDARD_CONFORMING_STRINGS TO ON;
 BEGIN;
 CREATE TABLE "test" (gid serial PRIMARY KEY,
 "address" varchar(32));
 INSERT INTO "test" ("address") VALUES ('Tårneby in Våler I Solør
 kommune');
 COMMIT;
 }}}
 The problem is that psql won't load this sql-file into the database
 complaining like this:
 {{{

 psql:test.sql:6: ERROR:  invalid byte sequence for encoding "UTF8":
 0xe5726e
 }}}
 So, what I have to do is changing first row in sql file to client_encoding
 LATIN1 instead. The everything works.


 According to
 [http://postgis.org/documentation/manual-1.5SVN/ch04.html#shp2pgsql_usage
 PostGIS doc] shp2pgsql is supposed to convert to UTF8 in the sql file so
 psql can load UTF8. I don't think it works that way. shp2pgsql does
 nothing about the actual encoding but should tell postgresql about the
 original encoding.

 The behavior of today makes it impossible to use shp2pgsql-gui since there
 is no way to edit the sql-file. You will get one error or another no
 matter what encoding you declare.

 I have only tried this on trunk version.

 What I don't understand is if some local settings in my system makes
 things different.
 But I think DEPESZ explanation [http://www.depesz.com/index.php/2010/03/07
 /error-invalid-byte-sequence-for-encoding/ here] makes sense and my
 experience agrees with it.

 Thanks
 Nicklas

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/808>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-devel mailing list