[MAPSERVER-USERS] ASCII -> UTF-8 convert problems for importing (GIS) data
Emilio Ponce
yosoycore at gmail.com
Mon Apr 21 11:01:59 PDT 2008
I had the same problem, to convert Charset you can use this option of
the Shp2pgsql tool
shp2pgsql shapefile_name table_name | iconv -f LATIN1 -t UTF-8 | psql -d db_name
It converts the charset of the shapefile from LATIN1 to UTF-8
2008/4/21 rich.fromm <nospam420 at yahoo.com>:
>
>
> Stefan Schwarzer wrote:
> >
> >
> >>> hmm.... I have a shapefile, which has some unorthodox characters (Ç,
> >>> ì, ...). Now, when importing the file (via shp2pgsql) into postgres,
> >>> it complains about it not being UTF-8 (my database has that format).
> >>>
> >>> So, how can I convert either the dbf file or than in a later stage
> >>> the
> >>> created text file from (I guess) ASCII into UTF-8?
> >
> >> You have an option for shp2pgsql (-W I think) to tell shp2pgsql to
> >> convert
> >> your data into this encoding:
> >
> > Yep, tried that too. But I get this message:
> >
> > shp2pgsql -s 4326 -I -W UTF-8 -D countries.shp gis.countries_new >
> > countries_new.sql
> > Shapefile type: Polygon
> > Postgis type: MULTIPOLYGON[2]
> > utf8: Invalid or incomplete multibyte or wide character
> >
> > We didn't really understand if the "-W" is to specify what the format
> > is (which we assumed) or into which format it has to be transformed.
> >
> > So, we would need something like transform ASCII into UTF-8.
> >
>
> -W describes the input format. The output format if you use it will be
> UTF-8. From the shp2pgsql(1) man page:
>
> ---
> -W <encoding>
> Specify the character encoding of Shapefile $-1òùs attributes.
> If this option is used the output will be encoded in UTF-8.
> ---
>
> So no, you don't want to transform it from ASCII, because you clearly don't
> have ASCII input, as ASCII does not have the characters you describe.
>
> You need to find out what the input data is encoded in. A very likely
> candidate is ISO-8859-1 (aka Latin-1).
>
> Take a look at the actual hex values of some of the non-English characters.
> (I use hexl-mode in emacs to do this, but there are plenty of other ways.)
> Compare them to ISO-8859-1, for example at either of these:
>
> http://en.wikipedia.org/wiki/ISO_8859-1
> http://anubis.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf
>
> For the two examples you cite, we have:
>
> 0xC7 LATIN CAPITAL C WITH CEDILLA
> 0xEC LATIN SMALL I WITH GRAVE
>
> Do they match? But this is still a bit of a guessing game, because you
> could find many matches and still not be right, e.g. ISO-8859-15 is very
> similar. A better way would be to look at the documentation for your input
> data, or ask the provider of the data.
>
> - Rich
>
> --
> View this message in context: http://www.nabble.com/ASCII--%3E-UTF-8-convert-problems-for-importing-%28GIS%29-data-tp16768968p16808302.html
> Sent from the Mapserver - User mailing list archive at Nabble.com.
>
> _______________________________________________
> mapserver-users mailing list
> mapserver-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapserver-users
>
--
Emilio
More information about the MapServer-users
mailing list