[postgis-users] Re: [Plr-general] Tutorial on PLR and PostGIS, more on carriage returns
Michael Fuhr
mike at fuhr.org
Fri Jun 22 00:27:53 PDT 2007
On Thu, Jun 21, 2007 at 02:50:02PM -0700, Paul Ramsey wrote:
> You're right and I'm wrong, I was confused by the UTF code numbers,
> which differ from the actual byte encodings used for UTF8. Indeed,
> all the multi-byte higher-order stuff is stuffed into 128-255 in the
> UTF8 encoding, so a straight byte-swap would work (for UTF8 and the
> various one-byte latin code pages, that is).
Additionally, leading and trailing bytes of multibyte UTF-8 sequences
use disparate ranges and the value of the leading byte indicates
how many trailing bytes follow. Section 2.5 of The Unicode Standard
discusses encoding form design principles; Section 3.9 contains
formal definitions. Table 3-7 shows the byte ranges allowed in
each position (single, leading, trailing).
http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf
--
Michael Fuhr
More information about the postgis-users
mailing list