[postgis-users] identify rows in a shapefile that have illegal characters for UTF-8 encoding

Paul Ramsey pramsey at opengeo.org
Mon Oct 22 13:42:47 PDT 2012


On Mon, Oct 22, 2012 at 1:01 PM, Mark Volz <MarkVolz at co.lyon.mn.us> wrote:
> I am trying to load my parcels into PostGIS, which will eventually be consumed by MapServer, and ArcGIS.  Initially when loaded my data I received a warning that I should change my encoding from UTF-8 to LATIN1.

How did you "change your encoding"? In your database, or in your data
load? If you just ran

shp2pgsql -W LATIN1 shpfile.shp tblename

Then the non-ASCII characters in your dbf file would have been
transcoded to UTF8 during the load and landed nicely in the database
with the right UTF code points.


> Doing so allowed me to load data into PostGIS however, I could not consume the data in ArcGIS.

This seems fishy. If your database is UTF and you load using the -W
flag as above, everything is pretty bog standard and ArcGIS should be
able to read it fine (particularly since the libpq library does all
the transcoding for client apps! ArcGIS doesn't even have to think
about transcoding, just declare the encoding it desires!)

> How can I find out which rows in my shapefile have illegal characters for UTF-8 encoding?

There are no illegal character for UTF, UTF can represent any and all
characters (and does). There's something else going on

P.



More information about the postgis-users mailing list