[GRASS-dev] HTML files
Glynn Clements
glynn at gclements.plus.com
Tue Aug 19 12:55:03 EDT 2008
I have been through and fixed some problems which prevented some of
the HTML files from validating. AFAICT, everything now validates (with
the sole exception of missing "alt" attributes within <img> tags).
Please ensure that all HTML files continue to validate against the
HTML 4.0 Transitional DTD. At some point, I want to replace g.html2man
with something more robust (e.g. something which handles tables), and
I don't particularly want to make a "smart" (i.e. fault-tolerant) HTML
parser (e.g. Beautiful Soup) a required dependency.
If you have OpenSP or OpenJade, you can validate an HTML file with
e.g.:
nsgmls -s -c /usr/share/sgml/openjade-1.3.2/pubtext/HTML4.soc <filename>.html
[The program may be called nsgmls or onsgmls, and the exact location
where the catalogues are installed will vary.]
This needs to be done on the completed HTML file in
dist.<arch>/docs/html; the <module>.html files in the module
directories won't normally validate, as they lack the header which is
added by running the module with the --html-description.
FWIW, the most common error was using block elements (e.g. <div>,
<pre>, <p>) in contexts where only inline elements are allowed
(primarily <dt>).
You can determine which elements are allowed where from the DTD:
http://www.w3.org/TR/1998/REC-html40-19980424/sgml/loosedtd.html
E.g. the definition:
<!ELEMENT DT - O (%inline;)* -- definition term -->
indicates that only inline elements are allowed inside DT, while e.g.:
<!ELEMENT DD - O (%flow;)* -- definition description -->
indicates that both block and inline elements are allowed inside DD.
If you don't want to read the DTD, here's a rough summary:
Entity classes:
%StyleSheet = <CSS stylesheet>
%Script = <JavaScript code>
%html.content = HEAD, BODY
%head.content = TITLE, ISINDEX, BASE
%heading = H1, H2, H3, H4, H5, H6
%fontstyle = TT, I, B, U, S, STRIKE, BIG, SMALL
%phrase = EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR,
ACRONYM
%special = A, IMG, APPLET, OBJECT, FONT, BASEFONT, BR, SCRIPT,
MAP, Q, SUB, SUP, SPAN, BDO, IFRAME
%formctrl = INPUT, SELECT, TEXTAREA, LABEL, BUTTON
%list = UL, OL, DIR, MENU
%head.misc = SCRIPT, STYLE, META, LINK, OBJECT
%pre.exclusion = IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP,
FONT, BASEFONT
%preformatted = PRE
%block = P, DL, DIV, CENTER, NOSCRIPT, NOFRAMES,
BLOCKQUOTE, FORM, ISINDEX, HR, TABLE, FIELDSET,
ADDRESS, %heading, %list, %preformatted
%inline = #PCDATA, %fontstyle, %phrase, %special, %formctrl
%flow = %block, %inline
The immediate children permitted for each element are:
A: %inline
ABBR: %inline
ACRONYM: %inline
ADDRESS: %inline, P
APPLET: %flow, PARAM
B: %inline
BDO: %inline
BIG: %inline
BLOCKQUOTE: %flow
BODY: %flow, INS, DEL
BUTTON: %flow
CAPTION: %inline
CENTER: %flow
CITE: %inline
CODE: %inline
COLGROUP: COL
DD: %flow
DEL: %flow
DFN: %inline
DIR: LI
DIV: %flow
DL: DT, DD
DT: %inline
EM: %inline
FIELDSET: %flow, LEGEND
FONT: %inline
FORM: %flow
FRAMESET: FRAMESET, FRAME, NOFRAMES
H1: %inline
H2: %inline
H3: %inline
H4: %inline
H5: %inline
H6: %inline
HEAD: %head.content, %head.misc
HTML: %html.content
I: %inline
IFRAME: %flow
INS: %flow
KBD: %inline
LABEL: %inline
LEGEND: %inline
LI: %flow
MAP: %block, AREA
MENU: LI
NOFRAMES: %flow
NOSCRIPT: %flow
OBJECT: %flow, PARAM
OL: LI
OPTGROUP: OPTION
OPTION: #PCDATA
P: %inline
PRE: %inline
Q: %inline
S: %inline
SAMP: %inline
SCRIPT: %Script
SELECT: OPTGROUP, OPTION
SMALL: %inline
SPAN: %inline
STRIKE: %inline
STRONG: %inline
STYLE: %StyleSheet
SUB: %inline
SUP: %inline
TABLE: CAPTION, COL, COLGROUP, THEAD, TFOOT, TBODY
TBODY: TR
TD: %flow
TEXTAREA: #PCDATA
TFOOT: TR
TH: %flow
THEAD: TR
TITLE: #PCDATA
TR: TH, TD
TT: %inline
U: %inline
UL: LI
VAR: %inline
Some elements don't allow certain elements as descendents:
A: A
BUTTON: %formctrl, A, FORM, ISINDEX, FIELDSET, IFRAME
DIR: %block
FORM: FORM
LABEL: LABEL
MENU: %block
PRE: %pre.exclusion
TITLE: %head.misc
Notes:
1. The children of DIR/MENU are LI, which is a block element, but
those LI can't contain block elements. UL/OL don't have this
restriction.
2. DT cannot contain block elements, but DD can. This means that you
can't use <div class="code"><pre> in a DT; use <span class="code"><tt>
instead. DIV and PRE are block elements; SPAN and TT are inline.
3. TABLE cannot have TR as a child. But TBODY can have TR, and TBODY
allows both the start and end tags to be omitted, so
<table><tr>....</tr></table> is really just a shorthand for
<table><tbody><tr>....</tr></tbody></table>.
4. P cannot contain blocks. So <p>...<div> is actually shorthand for
<p>...</p><div>. But <p>...<div>...</div>...</p> is an error, as the
</p> doesn't match any open element (the <div> implicitly closed the
original <p>, and P doesn't allow the start tag to be omitted).
5. HTML, HEAD, BODY, and TBODY allow the start tag to be omitted. With
the exception of TBODY, this feature shouldn't be used (it's a
nuisance to implement if the number of valid child tags is large).
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list