[Live-demo] Re: [OSGeo] #896: sphinx doc build is broken because of BOM

edgar.soldin at web.de edgar.soldin at web.de
Sat May 12 02:55:29 PDT 2012


On 12.05.2012 03:00, OSGeo wrote:
> #896: sphinx doc build is broken because of BOM
> ---------------------+------------------------------------------------------
>  Reporter:  fgdrf    |       Owner:  live-demo@…              
>      Type:  defect   |      Status:  new                      
>  Priority:  major    |   Milestone:                           
> Component:  LiveDVD  |    Keywords:                           
> ---------------------+------------------------------------------------------
> 
> Comment(by hamish):
> 
>  the Byte Order Mark has been added and removed from the .csv lists of
>  contributers for a while now.
> 
>  I haven't really been sure if they should be there or not so only did a
>  quick edit just before the last release to stop the table creation from
>  breaking.
> 
>  It's easy enough to open with vi and delete the first two chars in the
>  file if needed.. Converting UTF back to ISO-8859-1 isn't too bad either:
>    `iconv -f UTF-8 -t ISO_8859-1 utf_file > iso_file`
> 
> 
>  Qs:
>   * Should the BOM be there or not?

according to
http://en.wikipedia.org/wiki/Byte_order_mark
it is maningless for UTF-8 but allowed.

>   * What files (if any) should be saved in UTF-8, and why? (ISO will not
>  handle non-Western multibytes, but that doesn't necessitate that the
>  English/Western pages also be in UTF)
> 
>  this is out of my area of expertise, but the constant "last committer
>  wins" back and forth of text file variants is as we see here causing
>  problems.
> 

WHICH:
i'd suggest to keep realms where everything is in *one* character encoding which can be announced so people can use the proper editor e.g. UTF-8 for the docs.

WHY:
users from languages with characters not in latin-1 aka. ISO_8859-1 can eventually write names and texts natively without having to escape convert them.
as UTF-8 is backwards compatible with ASCII it also keeps at least this (currently most important user-base-wise) area intact even on misconversion.
editor software usually warns when trying to open or save unsupported characters into a different character set.

we could actually use svn properties to effectively assign MIME-TYPE and character set to specific files which is respected by most svn clients.

for the BOM issue:
i don't know the sphinx internals, but would it be too difficult to strip the BOM on each file read conditionally? just for safety?

..ede




More information about the Osgeolive mailing list