[postgis-devel] Re: [Fwd: PostGIS : shp2pgsql i18n patch.]

IIDA Tetsushi hogepiyo at nifty.com
Wed Jan 12 07:21:11 PST 2005


> Tetsushi, with help of some fine postgresql guys I finally
> understood the problem and your proposed solution.

Thank you.

> 1) Shouldn't we always produce UTF-8 encoding in output
>    in the presence of a -W switch ?
>
> 2) Shouldn't we report intended encoding in the output
>    by mean of a SET CLIENT_ENCODING to 'UTF' ?
>
> Using these two steps we'll be able to use the produced .sql
> as input for databases of whatever encoding, with the postgres
> server performing the final conversion. Would your nkf step
> be useless then ?

Yes. That's right in general.
I think that the method of surely outputting by UTF-8 is not bad.
But, when specification of a character code is included in the output
of a program, and the necessity for post-processing comes out,
it is trouble somewhat.

The reasons why I did not use UTF8 for an external code ...

1) This patch was worn for the offer of the workaround of
   the problem be happening now.
   There was no intention to change operation on externals of the
   program from the present one greatly.

2) I wanted to avoid showing the user the character-code for internal
   processing directly(*1).

3) Since it does not know whether PostgreSQL is compiled so
   that UNICODE can be used. (*2)

4) Because UNIX users in the sphere of multibyte-character are
  accustomed to the character-code conversion program :-)

Neither is so serious reasons.

(*1) UTF8 is not so used as an external code in Japan for various reasons.
(*2) I think that such an environment has decreased considerably recently.

--
IIDA Tetsushi
(mailto : Iida_Tetsushi at oi.nu)


----- Original Message ----- 
From: <strk at refractions.net>
To: "IIDA Tetsushi" <hogepiyo at nifty.com>; 
<postgis-devel at postgis.refractions.net>
Cc: "Jeff Lounsbury" <jeffloun at refractions.net>; "Paul Ramsey" 
<paul at refractions.net>
Sent: Wednesday, January 12, 2005 8:31 PM
Subject: Re: [Fwd: PostGIS : shp2pgsql i18n patch.]


> Tetsushi, with help of some fine postgresql guys I finally
> understood the problem and your proposed solution.
>
> Now:
>
> 1) Shouldn't we always produce UTF-8 encoding in output
>    in the presence of a -W switch ?
>
> 2) Shouldn't we report intended encoding in the output
>    by mean of a SET CLIENT_ENCODING to 'UTF' ?
>
> Using these two steps we'll be able to use the produced .sql
> as input for databases of whatever encoding, with the postgres
> server performing the final conversion. Would your nkf step
> be useless then ?
>
> --strk;
>
>
> On Wed, Jan 12, 2005 at 03:52:58AM +0900, IIDA Tetsushi wrote:
>> Thank you for replying.
>>
>> >I did't understand the problem.
>>
>> Tens of thousands of kinds of characters are contained in
>> the character set of Asia. In 8 bits, it overflows, one
>> character will be expressed combining 2 bytes or more of data.
>>
>> the character code currently most used by MS-Windows of Japan
>> is the Shift_JIS code which Microsoft created.
>> Since ArcGIS also moves on MS-Windows, Shift_JIS code is used.
>>
>> Please refer to Following URL.
>> http://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-othercodes-shiftjis
>>
>> In character code of Japan, 0x5c is not a backslash but a
>> Yen sign, or is a part of character.
>> When 0x5c appears in the next of charactersother than 0x81-0x9f,
>> this is a Yen sign
>> When 0x5c appears in the 0x81-0x9f next, this is a part of one
>> Multi-byte character.
>>
>> if the input data of a local character code is treated as ASCII
>> character sequence, input data will be destroyed, and inaccurate
>> SQL will be outputted.
>>
>> A 7-bit part of UTF8 is compatibility with an ASCII code,
>> and since not becoming a part of two or more byte character
>> moreover is secured.
>>
>> >Is the 'nkf' script introducing malformed SQL or is it shp2pgsql ?
>>
>> nkf is contained in Linux distribution, such as RedHat and
>> FedoraCore, by the default by the program for changing a character
>> code.
>>
>> The example of the command input my page has imported the Shape
>> file of Shift_JIS in the database created by UTF8.
>> You may use the iconv command instead of nkf. Moreover, when
>> importing in the database of Shift_JIS, it is nkf needlessness.
>>
>>
>> >Can you provide an example shapefile and instruction on
>> >how to exploit the problem ?
>>
>> http://oi.nu/shp2pgsql/hogewk.zip
>> The name of the island at the westernmost tip of Japan is
>> put into the attribute column.
>>
>>
>> >Tetsushi (is this your name ?).
>>
>> yes. IIDA is a family name. The formal name of almost all the
>> people in Asia writes a family name previously. :-)
>>
>>
>> --- 
>> If there are some points in question, I will explain as much as
>> possible.
>> # but I'm not good at English. sorry.
>>
>> --
>> IIDA Tetsushi
>> (mailto : Iida_Tetsushi at oi.nu)





More information about the postgis-devel mailing list