[postgis-devel] Re: [Fwd: PostGIS : shp2pgsql i18n patch.]

strk at refractions.net strk at refractions.net
Wed Jan 12 09:02:56 PST 2005


Tetsushi, hackers,
I've added an optional support for UTF8 output in shp2pgsql.
To enable you have to set USE_ICONV to 1 in globale Makefile.config
or in the environment.

It defaults to 0 for easy of testing and because other compiler/linker
flags might be needed in that case (I don't need -liconv on my host,
your iconv.h might be somewere else, etc. etc.).

If you handle to build it with USE_ICONV=1 you'll have a -W <encoding>
flag. The value of <encoding> will be used for a 'SET CLIENT_ENCODING TO'
command in the head of output code, there are probably issues there as
well as postgresql will be more picky then iconv at recognizing (let
alone supporting) different encoding names.

Consider it as an alpha feature, and only use for testing.
Tetsushi, please report any problem you might encounter with this.

--strk;

On Wed, Jan 12, 2005 at 05:16:06PM +0100, strk at refractions.net wrote:
> On Thu, Jan 13, 2005 at 12:21:11AM +0900, IIDA Tetsushi wrote:
> > >Tetsushi, with help of some fine postgresql guys I finally
> > >understood the problem and your proposed solution.
> > 
> > Thank you.
> > 
> > >1) Shouldn't we always produce UTF-8 encoding in output
> > >   in the presence of a -W switch ?
> > >
> > >2) Shouldn't we report intended encoding in the output
> > >   by mean of a SET CLIENT_ENCODING to 'UTF' ?
> > >
> > >Using these two steps we'll be able to use the produced .sql
> > >as input for databases of whatever encoding, with the postgres
> > >server performing the final conversion. Would your nkf step
> > >be useless then ?
> > 
> > Yes. That's right in general.
> > I think that the method of surely outputting by UTF-8 is not bad.
> > But, when specification of a character code is included in the output
> > of a program, and the necessity for post-processing comes out,
> > it is trouble somewhat.
> 
> Can you tell us more about this necessity ?
> Is there a case in which a post-processing lets you store
> the data in a postgresql database with no UNICODE support ?
> 
> --strk;
> 
> > 
> > The reasons why I did not use UTF8 for an external code ...
> > 
> > 1) This patch was worn for the offer of the workaround of
> >   the problem be happening now.
> >   There was no intention to change operation on externals of the
> >   program from the present one greatly.
> > 
> > 2) I wanted to avoid showing the user the character-code for internal
> >   processing directly(*1).
> > 
> > 3) Since it does not know whether PostgreSQL is compiled so
> >   that UNICODE can be used. (*2)
> > 
> > 4) Because UNIX users in the sphere of multibyte-character are
> >  accustomed to the character-code conversion program :-)
> > 
> > Neither is so serious reasons.
> > 
> > (*1) UTF8 is not so used as an external code in Japan for various reasons.
> > (*2) I think that such an environment has decreased considerably recently.
> > 
> > --
> > IIDA Tetsushi
> > (mailto : Iida_Tetsushi at oi.nu)
> > 
> > 
> > ----- Original Message ----- 
> > From: <strk at refractions.net>
> > To: "IIDA Tetsushi" <hogepiyo at nifty.com>; 
> > <postgis-devel at postgis.refractions.net>
> > Cc: "Jeff Lounsbury" <jeffloun at refractions.net>; "Paul Ramsey" 
> > <paul at refractions.net>
> > Sent: Wednesday, January 12, 2005 8:31 PM
> > Subject: Re: [Fwd: PostGIS : shp2pgsql i18n patch.]
> > 
> > 
> > >Tetsushi, with help of some fine postgresql guys I finally
> > >understood the problem and your proposed solution.
> > >
> > >Now:
> > >
> > >1) Shouldn't we always produce UTF-8 encoding in output
> > >   in the presence of a -W switch ?
> > >
> > >2) Shouldn't we report intended encoding in the output
> > >   by mean of a SET CLIENT_ENCODING to 'UTF' ?
> > >
> > >Using these two steps we'll be able to use the produced .sql
> > >as input for databases of whatever encoding, with the postgres
> > >server performing the final conversion. Would your nkf step
> > >be useless then ?
> > >
> > >--strk;
> > >
> > >
> > >On Wed, Jan 12, 2005 at 03:52:58AM +0900, IIDA Tetsushi wrote:
> > >>Thank you for replying.
> > >>
> > >>>I did't understand the problem.
> > >>
> > >>Tens of thousands of kinds of characters are contained in
> > >>the character set of Asia. In 8 bits, it overflows, one
> > >>character will be expressed combining 2 bytes or more of data.
> > >>
> > >>the character code currently most used by MS-Windows of Japan
> > >>is the Shift_JIS code which Microsoft created.
> > >>Since ArcGIS also moves on MS-Windows, Shift_JIS code is used.
> > >>
> > >>Please refer to Following URL.
> > >>http://www.debian.org/doc/manuals/intro-i18n/ch-codes.en.html#s-othercodes-shiftjis
> > >>
> > >>In character code of Japan, 0x5c is not a backslash but a
> > >>Yen sign, or is a part of character.
> > >>When 0x5c appears in the next of charactersother than 0x81-0x9f,
> > >>this is a Yen sign
> > >>When 0x5c appears in the 0x81-0x9f next, this is a part of one
> > >>Multi-byte character.
> > >>
> > >>if the input data of a local character code is treated as ASCII
> > >>character sequence, input data will be destroyed, and inaccurate
> > >>SQL will be outputted.
> > >>
> > >>A 7-bit part of UTF8 is compatibility with an ASCII code,
> > >>and since not becoming a part of two or more byte character
> > >>moreover is secured.
> > >>
> > >>>Is the 'nkf' script introducing malformed SQL or is it shp2pgsql ?
> > >>
> > >>nkf is contained in Linux distribution, such as RedHat and
> > >>FedoraCore, by the default by the program for changing a character
> > >>code.
> > >>
> > >>The example of the command input my page has imported the Shape
> > >>file of Shift_JIS in the database created by UTF8.
> > >>You may use the iconv command instead of nkf. Moreover, when
> > >>importing in the database of Shift_JIS, it is nkf needlessness.
> > >>
> > >>
> > >>>Can you provide an example shapefile and instruction on
> > >>>how to exploit the problem ?
> > >>
> > >>http://oi.nu/shp2pgsql/hogewk.zip
> > >>The name of the island at the westernmost tip of Japan is
> > >>put into the attribute column.
> > >>
> > >>
> > >>>Tetsushi (is this your name ?).
> > >>
> > >>yes. IIDA is a family name. The formal name of almost all the
> > >>people in Asia writes a family name previously. :-)
> > >>
> > >>
> > >>--- 
> > >>If there are some points in question, I will explain as much as
> > >>possible.
> > >># but I'm not good at English. sorry.
> > >>
> > >>--
> > >>IIDA Tetsushi
> > >>(mailto : Iida_Tetsushi at oi.nu)
> > 
> 
> -- 
> 
> For standing up against patentability of software,
> 
>   Thank You, Poland!
> 
> Read the intervention:    http://kwiki.ffii.org/ConsPolon041221En
> Send your thanks:         thankyoupoland.info
> Read/do more:		  http://www.noepatents.org/
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel

-- 

For standing up against patentability of software,

  Thank You, Poland!

Read the intervention:    http://kwiki.ffii.org/ConsPolon041221En
Send your thanks:         thankyoupoland.info
Read/do more:		  http://www.noepatents.org/



More information about the postgis-devel mailing list