[GRASS-dev] HTML files

Glynn Clements glynn at gclements.plus.com
Tue Aug 19 12:55:03 EDT 2008


I have been through and fixed some problems which prevented some of
the HTML files from validating. AFAICT, everything now validates (with
the sole exception of missing "alt" attributes within <img> tags).

Please ensure that all HTML files continue to validate against the
HTML 4.0 Transitional DTD. At some point, I want to replace g.html2man
with something more robust (e.g. something which handles tables), and
I don't particularly want to make a "smart" (i.e. fault-tolerant) HTML
parser (e.g. Beautiful Soup) a required dependency.

If you have OpenSP or OpenJade, you can validate an HTML file with
e.g.:

	nsgmls -s -c /usr/share/sgml/openjade-1.3.2/pubtext/HTML4.soc <filename>.html

[The program may be called nsgmls or onsgmls, and the exact location
where the catalogues are installed will vary.]

This needs to be done on the completed HTML file in
dist.<arch>/docs/html; the <module>.html files in the module
directories won't normally validate, as they lack the header which is
added by running the module with the --html-description.

FWIW, the most common error was using block elements (e.g. <div>,
<pre>, <p>) in contexts where only inline elements are allowed
(primarily <dt>).

You can determine which elements are allowed where from the DTD:

http://www.w3.org/TR/1998/REC-html40-19980424/sgml/loosedtd.html

E.g. the definition:

<!ELEMENT DT - O (%inline;)*           -- definition term -->

indicates that only inline elements are allowed inside DT, while e.g.:

<!ELEMENT DD - O (%flow;)*             -- definition description -->

indicates that both block and inline elements are allowed inside DD.

If you don't want to read the DTD, here's a rough summary:

Entity classes:

	%StyleSheet	= <CSS stylesheet>
	%Script		= <JavaScript code>
	
	%html.content	= HEAD, BODY
	%head.content	= TITLE, ISINDEX, BASE
	%heading	= H1, H2, H3, H4, H5, H6
	%fontstyle	= TT, I, B, U, S, STRIKE, BIG, SMALL
	%phrase		= EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR,
			  ACRONYM
	%special	= A, IMG, APPLET, OBJECT, FONT, BASEFONT, BR, SCRIPT,
			  MAP, Q, SUB, SUP, SPAN, BDO, IFRAME
	%formctrl	= INPUT, SELECT, TEXTAREA, LABEL, BUTTON
	%list		= UL, OL,  DIR, MENU
	%head.misc	= SCRIPT, STYLE, META, LINK, OBJECT
	%pre.exclusion	= IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP,
			  FONT, BASEFONT
	%preformatted	= PRE
	%block		= P, DL, DIV, CENTER, NOSCRIPT, NOFRAMES,
			  BLOCKQUOTE, FORM, ISINDEX, HR, TABLE, FIELDSET,
			  ADDRESS, %heading, %list, %preformatted
	%inline		= #PCDATA, %fontstyle, %phrase, %special, %formctrl
	%flow		= %block, %inline

The immediate children permitted for each element are:
	
	A:		%inline
	ABBR:		%inline
	ACRONYM:	%inline
	ADDRESS:	%inline, P
	APPLET:		%flow, PARAM
	B:		%inline
	BDO:		%inline
	BIG:		%inline
	BLOCKQUOTE:	%flow
	BODY:		%flow, INS, DEL
	BUTTON:		%flow
	CAPTION:	%inline
	CENTER:		%flow
	CITE:		%inline
	CODE:		%inline
	COLGROUP:	COL
	DD:		%flow
	DEL:		%flow
	DFN:		%inline
	DIR:		LI
	DIV:		%flow
	DL:		DT, DD
	DT:		%inline
	EM:		%inline
	FIELDSET:	%flow, LEGEND
	FONT:		%inline
	FORM:		%flow
	FRAMESET:	FRAMESET, FRAME, NOFRAMES
	H1:		%inline
	H2:		%inline
	H3:		%inline
	H4:		%inline
	H5:		%inline
	H6:		%inline
	HEAD:		%head.content, %head.misc
	HTML:		%html.content
	I:		%inline
	IFRAME:		%flow
	INS:		%flow
	KBD:		%inline
	LABEL:		%inline
	LEGEND:		%inline
	LI:		%flow
	MAP:		%block, AREA
	MENU:		LI
	NOFRAMES:	%flow
	NOSCRIPT:	%flow
	OBJECT:		%flow, PARAM
	OL:		LI
	OPTGROUP:	OPTION
	OPTION:		#PCDATA
	P:		%inline
	PRE:		%inline
	Q:		%inline
	S:		%inline
	SAMP:		%inline
	SCRIPT:		%Script
	SELECT:		OPTGROUP, OPTION
	SMALL:		%inline
	SPAN:		%inline
	STRIKE:		%inline
	STRONG:		%inline
	STYLE:		%StyleSheet
	SUB:		%inline
	SUP:		%inline
	TABLE:		CAPTION, COL, COLGROUP, THEAD, TFOOT, TBODY
	TBODY:		TR
	TD:		%flow
	TEXTAREA:	#PCDATA
	TFOOT:		TR
	TH:		%flow
	THEAD:		TR
	TITLE:		#PCDATA
	TR:		TH, TD
	TT:		%inline
	U:		%inline
	UL:		LI
	VAR:		%inline

Some elements don't allow certain elements as descendents:

	A:		A
	BUTTON:		%formctrl, A, FORM, ISINDEX, FIELDSET, IFRAME
	DIR:		%block
	FORM:		FORM
	LABEL:		LABEL
	MENU:		%block
	PRE:		%pre.exclusion
	TITLE:		%head.misc

Notes:

1. The children of DIR/MENU are LI, which is a block element, but
those LI can't contain block elements. UL/OL don't have this
restriction.

2. DT cannot contain block elements, but DD can. This means that you
can't use <div class="code"><pre> in a DT; use <span class="code"><tt>
instead. DIV and PRE are block elements; SPAN and TT are inline.

3. TABLE cannot have TR as a child. But TBODY can have TR, and TBODY
allows both the start and end tags to be omitted, so
<table><tr>....</tr></table> is really just a shorthand for
<table><tbody><tr>....</tr></tbody></table>.

4. P cannot contain blocks. So <p>...<div> is actually shorthand for
<p>...</p><div>. But <p>...<div>...</div>...</p> is an error, as the
</p> doesn't match any open element (the <div> implicitly closed the
original <p>, and P doesn't allow the start tag to be omitted).

5. HTML, HEAD, BODY, and TBODY allow the start tag to be omitted. With
the exception of TBODY, this feature shouldn't be used (it's a
nuisance to implement if the number of valid child tags is large).

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list