[MAPSERVER-USERS] ASCII -> UTF-8 convert problems for importing (GIS) data

Emilio Ponce yosoycore at gmail.com
Mon Apr 21 14:01:59 EDT 2008


I had the same problem, to convert Charset you can use this option of
the Shp2pgsql tool

shp2pgsql shapefile_name table_name | iconv -f LATIN1 -t UTF-8 | psql -d db_name

It converts the charset of the shapefile from LATIN1 to UTF-8


2008/4/21 rich.fromm <nospam420 at yahoo.com>:
>
>
>  Stefan Schwarzer wrote:
>  >
>  >
>  >>> hmm.... I have a shapefile, which has some unorthodox characters (Ç,
>  >>> ì, ...). Now, when importing the file (via shp2pgsql) into postgres,
>  >>> it complains about it not being UTF-8 (my database has that format).
>  >>>
>  >>> So, how can I convert either the dbf file or than in a later stage
>  >>> the
>  >>> created text file from (I guess) ASCII into UTF-8?
>  >
>  >> You have an option for shp2pgsql (-W I think) to tell shp2pgsql to
>  >> convert
>  >> your data into this encoding:
>  >
>  > Yep, tried that too. But I get this message:
>  >
>  > shp2pgsql -s 4326 -I -W UTF-8 -D countries.shp gis.countries_new >
>  > countries_new.sql
>  > Shapefile type: Polygon
>  > Postgis type: MULTIPOLYGON[2]
>  > utf8: Invalid or incomplete multibyte or wide character
>  >
>  > We didn't really understand if the "-W" is to specify what the format
>  > is (which we assumed) or into which format it has to be transformed.
>  >
>  > So, we would need something  like transform ASCII into UTF-8.
>  >
>
>  -W describes the input format.  The output format if you use it will be
>  UTF-8.  From the shp2pgsql(1) man page:
>
>  ---
>        -W <encoding>
>               Specify the character encoding of Shapefile $-1òùs attributes.
>  If this option is used the output will be encoded in UTF-8.
>  ---
>
>  So no, you don't want to transform it from ASCII, because you clearly don't
>  have ASCII input, as ASCII does not have the characters you describe.
>
>  You need to find out what the input data is encoded in.  A very likely
>  candidate is ISO-8859-1 (aka Latin-1).
>
>  Take a look at the actual hex values of some of the non-English characters.
>  (I use hexl-mode in emacs to do this, but there are plenty of other ways.)
>  Compare them to ISO-8859-1, for example at either of these:
>
>  http://en.wikipedia.org/wiki/ISO_8859-1
>  http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf
>
>  For the two examples you cite, we have:
>
>  0xC7 LATIN CAPITAL C WITH CEDILLA
>  0xEC LATIN SMALL I WITH GRAVE
>
>  Do they match?  But this is still a bit of a guessing game, because you
>  could find many matches and still not be right, e.g. ISO-8859-15 is very
>  similar.  A better way would be to look at the documentation for your input
>  data, or ask the provider of the data.
>
>  - Rich
>
>  --
>  View this message in context: http://www.nabble.com/ASCII--%3E-UTF-8-convert-problems-for-importing-%28GIS%29-data-tp16768968p16808302.html
>  Sent from the Mapserver - User mailing list archive at Nabble.com.
>
>  _______________________________________________
>  mapserver-users mailing list
>  mapserver-users at lists.osgeo.org
>  http://lists.osgeo.org/mailman/listinfo/mapserver-users
>



-- 
Emilio


More information about the mapserver-users mailing list