[postgis-devel] Re: [Fwd: PostGIS : shp2pgsql i18n patch.]

strk at refractions.net strk at refractions.net
Wed Jan 12 08:16:06 PST 2005


On Thu, Jan 13, 2005 at 12:21:11AM +0900, IIDA Tetsushi wrote:
> >Tetsushi, with help of some fine postgresql guys I finally
> >understood the problem and your proposed solution.
> 
> Thank you.
> 
> >1) Shouldn't we always produce UTF-8 encoding in output
> >   in the presence of a -W switch ?
> >
> >2) Shouldn't we report intended encoding in the output
> >   by mean of a SET CLIENT_ENCODING to 'UTF' ?
> >
> >Using these two steps we'll be able to use the produced .sql
> >as input for databases of whatever encoding, with the postgres
> >server performing the final conversion. Would your nkf step
> >be useless then ?
> 
> Yes. That's right in general.
> I think that the method of surely outputting by UTF-8 is not bad.
> But, when specification of a character code is included in the output
> of a program, and the necessity for post-processing comes out,
> it is trouble somewhat.

Can you tell us more about this necessity ?
Is there a case in which a post-processing lets you store
the data in a postgresql database with no UNICODE support ?

--strk;

> 
> The reasons why I did not use UTF8 for an external code ...
> 
> 1) This patch was worn for the offer of the workaround of
>   the problem be happening now.
>   There was no intention to change operation on externals of the
>   program from the present one greatly.
> 
> 2) I wanted to avoid showing the user the character-code for internal
>   processing directly(*1).
> 
> 3) Since it does not know whether PostgreSQL is compiled so
>   that UNICODE can be used. (*2)
> 
> 4) Because UNIX users in the sphere of multibyte-character are
>  accustomed to the character-code conversion program :-)
> 
> Neither is so serious reasons.
> 
> (*1) UTF8 is not so used as an external code in Japan for various reasons.
> (*2) I think that such an environment has decreased considerably recently.
> 
> --
> IIDA Tetsushi
> (mailto : Iida_Tetsushi at oi.nu)
> 
> 
> ----- Original Message ----- 
> From: <strk at refractions.net>
> To: "IIDA Tetsushi" <hogepiyo at nifty.com>; 
> <postgis-devel at postgis.refractions.net>
> Cc: "Jeff Lounsbury" <jeffloun at refractions.net>; "Paul Ramsey" 
> <paul at refractions.net>
> Sent: Wednesday, January 12, 2005 8:31 PM
> Subject: Re: [Fwd: PostGIS : shp2pgsql i18n patch.]
> 
> 
> >Tetsushi, with help of some fine postgresql guys I finally
> >understood the problem and your proposed solution.
> >
> >Now:
> >
> >1) Shouldn't we always produce UTF-8 encoding in output
> >   in the presence of a -W switch ?
> >
> >2) Shouldn't we report intended encoding in the output
> >   by mean of a SET CLIENT_ENCODING to 'UTF' ?
> >
> >Using these two steps we'll be able to use the produced .sql
> >as input for databases of whatever encoding, with the postgres
> >server performing the final conversion. Would your nkf step
> >be useless then ?
> >
> >--strk;
> >
> >
> >On Wed, Jan 12, 2005 at 03:52:58AM +0900, IIDA Tetsushi wrote:
> >>Thank you for replying.
> >>
> >>>I did't understand the problem.
> >>
> >>Tens of thousands of kinds of characters are contained in
> >>the character set of Asia. In 8 bits, it overflows, one
> >>character will be expressed combining 2 bytes or more of data.
> >>
> >>the character code currently most used by MS-Windows of Japan
> >>is the Shift_JIS code which Microsoft created.
> >>Since ArcGIS also moves on MS-Windows, Shift_JIS code is used.
> >>
> >>Please refer to Following URL.
> >>http://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-othercodes-shiftjis
> >>
> >>In character code of Japan, 0x5c is not a backslash but a
> >>Yen sign, or is a part of character.
> >>When 0x5c appears in the next of charactersother than 0x81-0x9f,
> >>this is a Yen sign
> >>When 0x5c appears in the 0x81-0x9f next, this is a part of one
> >>Multi-byte character.
> >>
> >>if the input data of a local character code is treated as ASCII
> >>character sequence, input data will be destroyed, and inaccurate
> >>SQL will be outputted.
> >>
> >>A 7-bit part of UTF8 is compatibility with an ASCII code,
> >>and since not becoming a part of two or more byte character
> >>moreover is secured.
> >>
> >>>Is the 'nkf' script introducing malformed SQL or is it shp2pgsql ?
> >>
> >>nkf is contained in Linux distribution, such as RedHat and
> >>FedoraCore, by the default by the program for changing a character
> >>code.
> >>
> >>The example of the command input my page has imported the Shape
> >>file of Shift_JIS in the database created by UTF8.
> >>You may use the iconv command instead of nkf. Moreover, when
> >>importing in the database of Shift_JIS, it is nkf needlessness.
> >>
> >>
> >>>Can you provide an example shapefile and instruction on
> >>>how to exploit the problem ?
> >>
> >>http://oi.nu/shp2pgsql/hogewk.zip
> >>The name of the island at the westernmost tip of Japan is
> >>put into the attribute column.
> >>
> >>
> >>>Tetsushi (is this your name ?).
> >>
> >>yes. IIDA is a family name. The formal name of almost all the
> >>people in Asia writes a family name previously. :-)
> >>
> >>
> >>--- 
> >>If there are some points in question, I will explain as much as
> >>possible.
> >># but I'm not good at English. sorry.
> >>
> >>--
> >>IIDA Tetsushi
> >>(mailto : Iida_Tetsushi at oi.nu)
> 

-- 

For standing up against patentability of software,

  Thank You, Poland!

Read the intervention:    http://kwiki.ffii.org/ConsPolon041221En
Send your thanks:         thankyoupoland.info
Read/do more:		  http://www.noepatents.org/



More information about the postgis-devel mailing list