[postgis-users] Re: [Plr-general] Tutorial on PLR and PostGIS, more on carriage returns

Michael Fuhr mike at fuhr.org
Fri Jun 22 00:27:53 PDT 2007


On Thu, Jun 21, 2007 at 02:50:02PM -0700, Paul Ramsey wrote:
> You're right and I'm wrong, I was confused by the UTF code numbers,  
> which differ from the actual byte encodings used for UTF8.  Indeed,  
> all the multi-byte higher-order stuff is stuffed into 128-255 in the  
> UTF8 encoding, so a straight byte-swap would work (for UTF8 and the  
> various one-byte latin code pages, that is).

Additionally, leading and trailing bytes of multibyte UTF-8 sequences
use disparate ranges and the value of the leading byte indicates
how many trailing bytes follow.  Section 2.5 of The Unicode Standard
discusses encoding form design principles; Section 3.9 contains
formal definitions.  Table 3-7 shows the byte ranges allowed in
each position (single, leading, trailing).

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf

-- 
Michael Fuhr



More information about the postgis-users mailing list