[postgis-users] re: shp2sql and UTF8 question on Windows
Markus Schaber
schabi at logix-tt.com
Thu Jan 26 02:55:44 PST 2006
Hi, Mark,
Mark Cave-Ayland wrote:
> I would say that Markus' solution is the best one and we should perhaps
> consider dropping iconv support altogether. The error is thrown because
> PostgreSQL is extremely strict when it comes to accepting data which doesn't
> match a specific encoding.
I would not say stat my "solution" is better than using iconv, I'd say
the opposite. It is inferior in at least three points:
- It supports less input encodings (iconv has much more than postgres)
- It fails to load e. G. a LATIN-1 data into a LATIN-9 database because
postgres does not support conversion between those charsets.
- It will never work with non-ASCII compatible encodings like EBCDIC or
UTF-16 (even when support is added to the server) as the surrounding SQL
commands are always output in ASCII.
My patch just is a cleaner way of the bash magic we used to set the
client_encoding header earlier, before shp2pgsql supported iconv.
It was meant as a light-weight solution for those people who have a
shp2psql build without iconv for whatever reason.
Maybe we should refuse shp2psql without iconv at all, or use my patch
only when there's no iconv support available, together with a strong
warning (and not only mentioning) that iconv was not found, like:
**********************************************************
* WARNING *
* The iconv library was not found during configure. *
* This will lead to limited charset support capabilities *
* in shp2pgsql. Use at your own risk. *
**********************************************************
> In fact, if the people with this problem create a new database with an
> encoding of SQL_ASCII and re-import the data then there should be no problem
> since SQL_ASCII actually means "we don't care about the encoding so we'll
> treat the input as a binary string".
I also strictly oppose even thinking about creating a database with
SQL-ASCII. It is deprecated for good reason.
You can insert data with different encodings without error, and when
reading the data you don't know how to interpret the byte soup you just
got out. If you really want to store the original data, then you should
use a bytea column, together with a second column where you store the
encoding.
I'd like add a check to SQL-ASCII, and bomb the user with WARNINGS
whenever he inserts non-7-bit data into SQL-ASCII.
Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS
Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
More information about the postgis-users
mailing list