[postgis-devel] Re: [Fwd: PostGIS : shp2pgsql i18n patch.]

strk at refractions.net strk at refractions.net
Wed Jan 12 03:31:22 PST 2005


Tetsushi, with help of some fine postgresql guys I finally
understood the problem and your proposed solution. 

Now:

	1) Shouldn't we always produce UTF-8 encoding in output
	   in the presence of a -W switch ?

	2) Shouldn't we report intended encoding in the output
	   by mean of a SET CLIENT_ENCODING to 'UTF' ?

Using these two steps we'll be able to use the produced .sql
as input for databases of whatever encoding, with the postgres
server performing the final conversion. Would your nkf step
be useless then ?

--strk;


On Wed, Jan 12, 2005 at 03:52:58AM +0900, IIDA Tetsushi wrote:
> Thank you for replying.
> 
> >I did't understand the problem.
> 
> Tens of thousands of kinds of characters are contained in
> the character set of Asia. In 8 bits, it overflows, one
> character will be expressed combining 2 bytes or more of data.
> 
> the character code currently most used by MS-Windows of Japan
> is the Shift_JIS code which Microsoft created.
> Since ArcGIS also moves on MS-Windows, Shift_JIS code is used.
> 
> Please refer to Following URL.
> http://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-othercodes-shiftjis
> 
> In character code of Japan, 0x5c is not a backslash but a
> Yen sign, or is a part of character.
> When 0x5c appears in the next of charactersother than 0x81-0x9f,
> this is a Yen sign
> When 0x5c appears in the 0x81-0x9f next, this is a part of one
> Multi-byte character.
> 
> if the input data of a local character code is treated as ASCII
> character sequence, input data will be destroyed, and inaccurate
> SQL will be outputted.
> 
> A 7-bit part of UTF8 is compatibility with an ASCII code,
> and since not becoming a part of two or more byte character
> moreover is secured.
> 
> >Is the 'nkf' script introducing malformed SQL or is it shp2pgsql ?
> 
> nkf is contained in Linux distribution, such as RedHat and
> FedoraCore, by the default by the program for changing a character
> code.
> 
> The example of the command input my page has imported the Shape
> file of Shift_JIS in the database created by UTF8.
> You may use the iconv command instead of nkf. Moreover, when
> importing in the database of Shift_JIS, it is nkf needlessness.
> 
> 
> >Can you provide an example shapefile and instruction on
> >how to exploit the problem ?
> 
> http://oi.nu/shp2pgsql/hogewk.zip
> The name of the island at the westernmost tip of Japan is
> put into the attribute column.
> 
> 
> >Tetsushi (is this your name ?).
> 
> yes. IIDA is a family name. The formal name of almost all the
> people in Asia writes a family name previously. :-)
> 
> 
> --- 
> If there are some points in question, I will explain as much as
> possible.
> # but I'm not good at English. sorry.
> 
> --
> IIDA Tetsushi
> (mailto : Iida_Tetsushi at oi.nu)
> 
> 
> ----- Original Message ----- 
> From: <strk at refractions.net>
> To: "IIDA Tetsushi" <Iida_Tetsushi at oi.nu>
> Cc: "Jeff Lounsbury" <jeffloun at refractions.net>; "Paul Ramsey" 
> <paul at refractions.net>
> Sent: Tuesday, January 11, 2005 7:57 PM
> Subject: Re: [Fwd: PostGIS : shp2pgsql i18n patch.]
> 
> 
> >Tetsushi (is this your name ?).
> >I did't understand the problem. From your site I see that you use a
> >post-processing script: nkf -S -w.
> >Is the 'nkf' script introducing malformed SQL or is it shp2pgsql ?
> >
> >Can you provide an example shapefile and instruction on
> >how to exploit the problem ?
> >
> >TIA
> >
> >--strk;
> >
> >On Mon, Jan 10, 2005 at 09:16:50AM -0800, Jeff Lounsbury wrote:
> >>FYI.
> >>
> >>-------- Original Message --------
> >>Subject: PostGIS : shp2pgsql i18n patch.
> >>Date: Sun, 9 Jan 2005 03:38:48 +0900
> >>From: IIDA Tetsushi <Iida_Tetsushi at oi.nu>
> >>To: <jeff at refractions.net>
> >>
> >>I have written and distributed the internationalization patch of
> >>shp2pgsql in Japan.
> >>
> >>http://oi.nu/shp2pgsql/shp2pgsql-i18n.tar.gz
> >>http://oi.nu/shp2pgsql/index.html
> >>
> >>Would you include the contents of this patch in the future release,
> >>if it does not interfere?
> >>
> >>---
> >>* Background :
> >>at ArcGIS, the value of the attribute column of a ShapeFile is stored
> >>in local character code of each country.
> >>A part of implementation of the present shp2pgsql has been processing
> >>only supposing the ASCII code.
> >>For example, in local character code of Japan, there is a case which
> >>cannot generate the right SQL in the present escape algorithm.
> >>
> >>* Solution :
> >>Once it changes the character code of each country into UTF8,
> >>escape processing is performed to it,
> >>and it returns to the original character code.
> >>
> >>The character code of each country can be treated by specifying
> >>-W <encoding> option.
> >>Operation at the time of omitting an option becomes the same as
> >>the present processing.
> >>
> >>*Build procedure :
> >>It will be internationalized in a USE_ICONV flag is specified.
> >>
> >>--
> >>
> >>Since a regular procedure which presents a patch was not found,
> >>I connected directly.
> >>Please allow, if the procedure (and/or my English) is wrong.
> >>
> >>--
> >>IIDA Tetsushi
> >>(mailto : Iida_Tetsushi at oi.nu)
> >>
> >>
> >
> >-- 
> >
> >For standing up against patentability of software,
> >
> > Thank You, Poland!
> >
> >Read the intervention:    http://kwiki.ffii.org/ConsPolon041221En
> >Send your thanks:         thankyoupoland.info
> >Read/do more:   http://www.noepatents.org/
> >
> 

-- 

For standing up against patentability of software,

  Thank You, Poland!

Read the intervention:    http://kwiki.ffii.org/ConsPolon041221En
Send your thanks:         thankyoupoland.info
Read/do more:		  http://www.noepatents.org/



More information about the postgis-devel mailing list