[postgis-users] re: shp2sql and UTF8 question on Windows

Mark Cave-Ayland m.cave-ayland at webbased.co.uk
Thu Jan 26 02:40:05 PST 2006


Hi strk/Markus,

> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-
> bounces at postgis.refractions.net] On Behalf Of Markus Schaber
> Sent: 26 January 2006 10:00
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] re: shp2sql and UTF8 question on Windows

(cut)

> > I wouldn't add too many possible configurations.
> > If you want locale support just install iconv and that'd enable it.
> 
> That doesn't work for users of prepackaged binaries, e. G. the current
> win32 build. And AFAIR the current debian packages from Alex build
> without ICONV, too.
> 
> > If autoconf is not that smart in finding it we should fix it.
> 
> That's correct. Maybe we should issue clear warnings when iconf is not
> found.

Yeah, I've had a play with the Win32 compile of iconv and it required some
(fairly minor) patching just to compile. However, it just doesn't seem to
want to be found by configure... :(

> > What does postgresql uses for locales, btw ?
> 
> I tend to believe that they use their own implementation, as I cannot
> see any iconv dependency, and the release notes on
> http://www.postgresql.org/docs/8.1/static/release-8-1.html mentioned
> some added encodings and conversions, as well as "added support
> four-byte UTF8 characters" where they only supported 1-3 bytes before.

Correct.

I would say that Markus' solution is the best one and we should perhaps
consider dropping iconv support altogether. The error is thrown because
PostgreSQL is extremely strict when it comes to accepting data which doesn't
match a specific encoding.

In fact, if the people with this problem create a new database with an
encoding of SQL_ASCII and re-import the data then there should be no problem
since SQL_ASCII actually means "we don't care about the encoding so we'll
treat the input as a binary string". 

So the reason the error was thrown in Randy's case was because the database
was specified as UNICODE (an alias for UTF8) and a non-UTF8 (probably
WIN1250/WIN) string was being sent to the server, which seems like me to be
a valid thing to do.

If your data encoding doesn't match that of your database then using SET
CLIENT_ENCODING beforehand will give the database enough information to do
the translation for you, instead of having to keep adding extra iconv code
in shp2pgsql whenever someone needs a different conversion. Note that even
though the data is always stored as UTF8 in a UNICODE database, PostgreSQL
can return the data in whichever encoding the client application requests,
again use the CLIENT_ENCODING variable.

http://www.postgresql.org/docs/8.0/interactive/multibyte.html



Kind regards,

Mark.

(who got bitten by this whilst upgrading a large SQL_ASCII encoded PostGIS
database to UNICODE last year)

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.





More information about the postgis-users mailing list