Discussions for Future Directions for HTML

These "discussions," published over the course of 1992, as the second version of HTML (and first documented) was in common use, show the directions that HTML would come to take which would finally mature by HTML 1 in mid-1993. Except for the "HTML (extractions)" file which I only include as an introduction and Table of Contents, I again include the raw text of the file, a pointer to where it currently is on the web (in the W3C historical Archives), and also show the file as it should have displayed. In the case of error files, I have attempted to guess at how the errors were meant to be handled if encountered. In keeping this file compliant with HTML 4.01 Strict I have not included the actual errors in this file.


HTML (extractions)

The WWW system uses marked-up text to represent a hypertext document for transmision over the network. The hypertext mark-up language is an SGML format. WWW parsers should ignore tags which they do not understand, and ignore attributes which they do not understand of tags which they do understand.

The following does not form part of the specifciation.

Future directions
Changes suggested for HTML improvements
HTTP2 definition
May 92 -- provisional
Letter Re: Still no DTD, huh?
 
Letter Re: Re: status. Re: X11 BROWSER for WWW
 
DTD
The SGML document type definition for HTML.

See also

New spec
As edited by Dan Connolly, convex. Comments to www-talk@info.cern.ch please.

Last-Modified: Tue, 23 Nov 1999 10:13:15 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/MarkUp.html (Validate)


www-talk from September to October 1991: Re: status. Re: X11 BROWSER for WWW

Date: Tue, 29 Oct 91 10:03:11 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9110290903.AA07413@ nxoc01.cern.ch >
To: connolly@pixel.convex.com, www-talk
Subject: Re: status. Re: X11 BROWSER for WWW 

Dan,

> I've made some tangible progress on the X11 browser, so I though 

> I'd let you know.
> ...
> This code is not in any shape to distribute, or even show anybody.
> But it works, and it's pretty speedy. That's enough to encourage me  
> to polish it off.

Sounds like great progress! The TCL sounds interesting -- where did  
you get it? 


> [If you wan't my stuff, you'll have to be C++ capable. I can't
> think in C any more. :-]

Don't worry - we can handle C++, although for the line mode browser  
we wanted portability into places where C++ could not reach. That's  
why the common code (in WWW/Implementation) is all in C. Believe me,  
after writing the NeXT browser in Objective-C it was a wrench to  
conclude that it would have to be deobjectified.

> If you could round up some info on exactly what I can expect to see  
> in an HTML file, and some idea of how you want it formatted [I have  
> the HTML doc and the LineMode browser, but if you've got time to
> give me a little more info...] I'll be ready to tackle that pretty  
> soon.

You ask for info on exactly what you can expect to find in an HTML  
file, but you've read the two HTML files about HTML.  What is missing  
from there?

Here is some discussion about the tags -- where it's not in  
http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html I have updated  
that document now.

Most of the tags are just style tags: this goes for the headings H1  
to H6, the lists UL and OL with list elements LI, the glossary DL  
with elements DT and DD.

<TITLE> ..<TITLE> is designed to be used for putting in the top  
banner of a window, or using as the window  name. It also is what you  
would use in a history list. It shouldn't be displayed in the text  
itself, as usually there is a <H1> heading atteh top of the text  
anyway. A difference is that thet title is designed to make sense out  
of context, whereas the heading is within context. For example,
a title might be "Formatting Characters for Printf -- C reference  
manual" whereas the heading may just be "Formatting characters".

The base address tag is not used, nor is highlighting HP1 etc.

Anchors are used!  The REL attribute is NOT used.

<ISINDEX> is sent by servers to indicate that they will accept a  
search given this document name plus keywords. It turns on a search  
panel when the document is the main window.  An even better  
implementation would have a keyword field at the bottom of the text  
window if the document is a searchable index.  That would make the  
document more self-contained as an item in the user's eyes, and  
reduce screen clutter.

<NEXTID> can be ignored by browsers, only needed for editors.

<XMP> and <LISTING> are used to indicate inserted literal text.
To make life easier for those writing documents (and because we don't  
have entities in the code yet) they are special in that EVERYTHING is  
litteral text until the closing tag - so one can use XMP for giving
examples of HTML for example.  (We really need an escaping method -  
the next parser will have simpl entities like "&lt." for "<".)
Within XMP or LISTING, newlines are significant (and mean "new  
line"!)

<PLAINTEXT> is used to indicate that the rest of the file is in fact
just ASCII. It turns off SGML parsing completely. It's a fudge for
the moment, until we have the document format negociation.
______________________________________

        Structure of documents:

In writing a new generic parser, I wondered whether your text object  
will store the nested structure of a document. At the moment, the  
document is a linear sequence of styles: you can't have lists within  
lists, etc. Ideally, it would be able to handle this - although its  
more difficult for a human writer to handle when formatting the  
document. I would in fact prefer, instead of <H1>, <H2> etc for  
headings [those come from the AAP DTD] to have a nestable  
<SECTION>..</SECTION> element, and a generic <H>..</H> which at any  
level within the sections would produce the required level of  
heading.

For a browser, it is quite satisfactory to flatten the structure back  
into a sequence of styles, but for an editor it isn't. Are you going  
to go for editing capability?

Tim

PS: Shall I put you on the www-talk list?

In this case, I have included only the raw text of the e-mail itself, omitting the HTML 4.01 blog threading material at the top and bottom. I have also corrected the link to point to the actual file intended, though the file has been updated and does not show what would have been seen if one followed that link in October of 1991. This file shows that there was some working draft of the description of HTML in work at that point, which eventually morphed into the oldest description available now dated November 13, 1992. One sees here Mr. Berners-Lee even here speculating about a <SECTION> tag and even a simple <H> tag meant to derive its level from the other tag instead of from a number as has always been used for HTML headers. Even more unusual is the mention of the <OL> tag which had been eliminated in the previous January, and which he only allowed back in due to pressure from Dan Connolly in 1993. Perhaps he was concerned with compatibility with those ancient files that had it, or else as a friendly nod to Dan (to whom he was writing this) who may have always resented its removal from HTML. Even more unexpected he even mentions a REL attribute (presumably of <A>) instead of the TYPE attribute discussed later, a kind of anticipation of the fact that TYPE would one day be renamed to REL later on. It is also interesting to note that at this point <ISINDEX> was already considered a going concern at this point (but only as something inserted by a smart server) while <MENU> and <DIR> are not mentioned. Clearly they had not been invented as yet. See also what he thinks of <XMP>, <LISTING>, and <PLAINTEXT>, and his desire even then to "fix" them with "an escaping method." Finally, he mentions there being only "two HTML files about HTML," so at this point the larger suite of files found dated 13 November 1992 were still very much in much more preliminary forms.

Last-Modified: Wed, 12 May 2004 00:03:58 GMT

http://lists.w3.org/Archives/Public/www-talk/1991SepOct/0003.html


Letter_1 -- /Architecture - Windows Internet Explorer

<TITLE>Letter_1 -- /Architecture</TITLE>
<NEXTID 1>
<XMP>

Date: Thu, 4 Jun 92 00:59:21 +0200
From: jfg@dxcern.cern.ch (Jean Francois Groff)
Sender: jfg@dxcern.cern.ch
To: barker@www1.cern.ch
Subject: forwarded message from Tim Berners-Lee

------- Start of forwarded message -------
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA27986; Wed, 3 Jun 92 16:56:29 +0200
Received: by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA08770; Wed, 3 Jun 92 16:55:12 MET DST
Message-Id: <9206031455.AA08770@ nxoc01.cern.ch >
From: timbl@nxoc01.cern.ch (Tim Berners-Lee)
To: connolly@pixel.convex.com
Cc: timbl@nxoc01.cern.ch, wei@xcf.berkeley.edu, www-bug@nxoc01.cern.ch
Subject: Re: still no DTD, huh?
Date: Wed, 3 Jun 92 16:55:12 MET DST

Dan, taking your points in order before they pop off the screen.
I agree, attribute values ought to be quoted unless they contain
only sgml-nice characters. The www browers accept quotes or non-quoted
values. It is a bug in the NeXT editor that it exploits this feature.
B
When we fix the NeXT editor then we will put the quotes in. All
other p browsers use the SGML.c parser in the W3 dist which accept
quotes.

Yes, NEXTID will have to go. NEXTID will be anattibute of the
documenmt. We proposed
sorry propose 3 dcotypes,  HTDOC, HTERR and HTFWD to be described in
the DTD. These will be such that any extra tags they define, and
structure, will be safeley ignored by old parsers.

3. Minimisation.  This is copied from the BOOKMAKER style stuff.
Basically, we use <P> as a paragraph separater rather than a
paragraph begin or end.  It can be regarded as a minimized
paragraph element though. Its just that we actually parse it
as an empty elemnt with no end tag. That's still valid SGML
and you could write it in the DTD that way.
<LI> always has an opener and never a closer. The same applies
to <DD> and <DT>.  Note that we have though made sure that the browser
will ignore closers to these, so we could edfine teh DTD with them in
and optional.

4. YEs, sections appeal to me too. Especially when making 
big HTML files out of lots of little ones. The effect of
<SECTION> .. </SECTION> would be to demote all headings
by one inside the section.  I would be inclined then to
have simpky a <HEADING> tag which would be equivalent to H0
and map onto H1 within a section, or Hn within n sections.
The SGML parser can't generate this stuff, but the editors could
derive it from the style information. We would have to introduce <SECTION>
early on to get a transistion period. Then in HTML3 we would declare
H2 etc obsolete.

Pei Wei is maybe working on a DTD too and Carl Barker at CERN
is defininbg new features of HTML needed by new features in
the protocol (things like <BODY NOTATION=postscript> and suchlike).
Some of htis is defined in a few "technical notes" linked to
a listof technical notes linked to the W3 project page, if you want to 
see and comment.

(Carl: you could take this message in text form and link it in too)

Tim
________ Dan's message:
>From connolly@pixel.convex.com Wed Jun  3 04:23:34 1992
Return-Path: <connolly@pixel.convex.com>
Received: from dxmint.cern.ch by  nxoc01.cern.ch  (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0)
	id AA05562; Wed, 3 Jun 92 04:23:28 MET DST
Received: by dxmint.cern.ch (dxcern) (5.57/3.14)
	id AA27281; Wed, 3 Jun 92 04:21:34 +0200
Received: from pixel.convex.com by convex.convex.com (5.64/1.35)
	id AA25114; Tue, 2 Jun 92 21:21:17 -0500
Received: from localhost by pixel.convex.com (5.64/1.28)
	id AA23193; Tue, 2 Jun 92 21:21:15 -0500
Message-Id: <9206030221.AA23193@pixel.convex.com>
To: timbl@nxoc01.cern.ch
Subject: still no DTD, huh?
Date: Tue, 02 Jun 92 21:21:14 CDT
From: Dan Connolly <connolly@pixel.convex.com>
Status: R


by the way... replying to an address you sent me
doesn't work...

- ------- Forwarded Message

   ----- Transcript of session follows -----
>>> RCPT To:<timbl@dxmint.cern.ch>
<<< 550 <timbl@dxmint.cern.ch>... Addressee unknown
550 timbl@dxmint.cern.ch... User unknown

   ----- Unsent message follows -----
Date: Tue, 26 May 92 17:06:43 +0200
From: connolly (Dan Connolly)
Message-Id: <9205261506.AA25934@connie.de.convex.com>
To: timbl@dxmint.cern.ch
Subject: still no DTD, huh?
Cc: connolly@convex.com

I just browsed the web, hoping to find a DTD for HTML.
No such luck.
One nifty part of the Chameleon project is an X windows
grammar editor for developing context free grammars.
It's a little clunky, but in addition to outputting
editable Chameleon grammar files, it can write
YACC specifications or !SGML DTD's! Finally! a simple
DTD editor!

Unfortunately, it doesn't support attributes, and
I don't think the DTD's it creates have minimization,
but it could certainly save a lot of time in
creating a DTD!

I'll see if I can prototype something when I get back.

More later.

Dan

- ------- End of Forwarded Message

Well, I've been attempting to prototype something with
Devegram, the Integrated Chameleon Architecture's (ICA's)
grammar editor.

I messed around a while and had it write out an SGML
DTD to play with. Unfortunately Devegram doesn't support
many features of an SGML DTD which would be most
convenient to describe HTML. So I've abandoned Devegram
in favor of a text editor. But it did help with
the initial prototype.

Now for the REAL problems: HTML in its present form
is very difficult to describe in SGML. I'm not experienced
enough to say for sure, but I think it's impossible.
The problems are mostly small and lexical in nature, but
I'd say it's VERY important to make these changes NOW in
order to be able to use SGML processing engines in WWW
clients in the future.

An SGML document consists of 3 parts: the declaration,
the prologue, and the instance. The declaration lays
the groundwork -- defines the encoding and interpretation
of the character set(s), sets processing limits and bounds,
and other lexical stuff. Applications generally use the
default SGML declaration given in the standard. Each
SGML parser has a declaration that declares its feature
list and limits. If HTML cannot be described with
the default SGML declaration, this will severely limit
the usable parsers. (one exception is the NAMELEN limit:
many parsers have a value higher than 8)

The prologue (sometimes called the DTD, though there may
be more than one DOCTYPE in the prologue)
gives the structure of the document -- the
basic grammar and entities and such. This varies from
one application to another, but generally one SGML
declaration and prologue is used throughout an application.
For example, CALS specifies an SGML declaration and some
DTD's. The AAP also has a DTD.

The third part is the document instance. This is the part
that varies from one document to another within an
application domain.


I'm trying to use the default SGML declaration and design
a DTD such that all HTML files are instances of that DTD.

- --- 1--- The first problem I've come accross is that HTML attribute
values are not quoted. That is:

<A NAME=2 HREF=http://crnvmc.cern.ch./WHO>

yields

sgmls: SGML error at ../../../WWW/WWW/LineMode/Defaults/default.html, line 8 at 
":":
       Incorrect character in markup; markup terminated

I don't know what the exact syntax of an SGML attribute is,
but it's not the same as HTML's "everything up to the
next space or >" syntax.


- --- 2 --- Next, all attributes have names. So I can't figure
out a way to parse
<NEXTID 10>
I could do
<NEXTID n=10>

- --- 3 --- The biggest problem is the somewhat random use of
minimization. I can't seem to make SGML sense of it.
More later. I don't have as much time as I thought to
explain this.

- --- 4 --- I'd also like to be able to add a little
more structure than just a "big list of tags and
text" to the documents like this:

<HTML>
<TITLE>foo</TITLE>
   <SECTION>
	<H1> header </H1>
	paragraph associated with above header
	<SUBSECTION>
	<H2> header </H2>
	stuff under H2
	</SUBSECTION>
  </SECTION>
</HTML>

I can _almost_ get the SGML parser to infer the <SECTION>
and </SECTION> tags, but not quite.

More later.

Dan


------- End of forwarded message -------

</XMP>

In this case, since there is nothing but <TITLE> and <XMP> tags, and the rest all raw text, I have simply shown the raw text. One sees here some speculation about introducing <SECTION> and <SUBSECTION> tags, and mention of adding a N attribute to <NEXTID>. It is reassuring to see that they had the same problems generating a definition in SGML as I have had with the first two versions of HTML, namely the values of attributes when they contain certain non-SGML characters (absolute URL's, for example), the mere use of a number as the parameter of <NEXTID>, and it also took some doing to solve whether certain tags (<P> and <LI>) would be separator tags or container tags. At the time they did not seem to think that it would do to make them container tags since closing tags had not been used, so the challenge was to make them as separator tags, a problem that was eventually solved, only to go to making them container tags by HTML 2. But the ability to make them as separator tags did nevertheless serve as a precedent for such unary tags as <BR> and <HR>. The idea of major document subdivisions as illustrated here with the <SECTION> and <SUBSECTION> tags eventually resurfaces with the introduction of <DIV> in HTML 3 and 3.2, albeit without the peculiar effects on the <Hn> header tags as proposed here. From this it is quite clear that attempts to define HTML in terms of SGML constructs and lexical syntax was underway at least as early as May of 1992, and with it a clear intention that SGML might form the academic foundation for HTML.

Last-Modified: Thu, 04 Jun 1992 07:10:03 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/Architecture/Letter_1.html


HTML2 -- /MarkUp - Windows Internet Explorer

Last Modified 10/6/92 by CTB

Updates To HTML

In order to improve the functionality of the World Wide Web, the HyperText Markup Language must be tidied up, to allow it to be processed by generic SGML engines, and not just the WWW one.

The updated HTML will have a greater structure than the original version, including a header section, separate from the body.

This header section will allow the following tags:

<KEYWORDS>...</KEYWORDS>
<TITLE>...</TITLE>
<NEXTID ID="NNN">
<ISINDEX>

In the body section, the following tags will be recognised:

<A NAME="XXX" HREF="XXX" TYPE="XXX">...</A>
<PLAINTEXT>
<LISTING>...</LISTING>
<P>
<H1>...</H1>, <H2>...</H2>, <H3>...</H3>, <H4>...</H4>, <H5>...<H5>, <H6>...<H6>
<ADDRESS>...</ADDRESS>
<DL><DT>...<DD>...</DL>
<UL><LI>...</UL>



______________________________________________________ CTB

<TITLE>HTML2 -- /MarkUp</TITLE>
<NEXTID 1>
<ADDRESS>Last Modified 10/6/92 by CTB
</ADDRESS>
<H1>Updates To HTML</H1>In order to improve the functionality of the World Wide Web, the HyperText
Markup Language must be tidied up, to allow it to be processed by
generic SGML engines, and not just the WWW one.  <P>
The updated HTML will have a greater structure than the original version,
including a header section, separate from the body.<P>
This header section will allow the following tags:
<XMP><KEYWORDS>...</KEYWORDS>
<TITLE>...</TITLE>
<NEXTID ID="NNN">
<ISINDEX>

</XMP>In the body section, the following tags will be recognised:
<XMP><A NAME="XXX" HREF="XXX" TYPE="XXX">...</A>
<PLAINTEXT>
<LISTING>...</LISTING>
<P>
<H1>...</H1>, <H2>...</H2>, <H3>...</H3>, <H4>...</H4>, <H5>...<H5>, <H6>...<H6>
<ADDRESS>...</ADDRESS>
<DL><DT>...<DD>...</DL>
<UL><LI>...</UL>



</XMP>______________________________________________________<A NAME=0 HREF=http://info.cern.ch/hypertext/WWW/People.html#11> CTB</A></A>

In the proposal suggested here, the dependence upon SGML is even more explicit as the HTML tags listed here pretty much comprise what was available as of that time, with a few ideas not as yet realized. Here one finds the first and only mention of a proposed <KEYWORDS> header section tag. This function would eventually surface as a content to the HTML 2-introduced <META> tag. It also illustrated two attributes that did not yet exist, TYPE for <A> and ID for <NEXTID>. Even months later, in November, the TYPE attribute would still be under discussion as to what sorts of parameter values might serve, so it obviously was merely a vague proposal at this point. The ID attribute would be used in some of Dan Connolly's early drafts of a DTD, but N would be its name when finally introduced. <XMP> is not listed since it was used for showing the tags and its closing tag would have ruined it, but plainly it too was intended. Also missing are <MENU>, <DIR>, and the <HPn> tags. Apparently these only emerged as this phase of HTML was drawing to a close.

Last-Modified: Wed, 10 Jun 1992 12:39:26 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/HTML2.html


The following is the oldest survivng DTD, prepared by Dan Connolly, and showing some of the directions being taken. This file was modified in August, but by November no doubt a later version of the DTD incorporated the <TYPEWRITER> tag of which he wrote profusely (and even illustrated with a few working examples) at that time.


<!-- html.dtd - document type declaration subset for
                HyperText Markup Language as defined
		by the World Wide Web project.

 $Id: html.dtd,v 1.1 92/08/19 18:37:58 connolly Exp $

	15 Jul 92 by connolly@convex.com
	 6 Aug 92 revision: match HTML.c better
	18 Aug 92 revision: FrameMaker integration

	See also: http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
	          http://info.cern.ch/hypertext/WWW/MarkUp/HTML2.html
  -->

<!--      Character entities       -->
<!-- I wonder if we could just use numeric character entities, as
     long as we're just referencing ASCII characters.
     That is, write &#68; in stead of &lt; -->

<!ENTITY lt "<">
<!ENTITY gt ">">
<!ENTITY amp "&">
<!ENTITY bullet "&#183" -- @@@ NeXT only -->

<!-- parameter entities (DTD macros) -->

<!ENTITY % a.a "NAME CDATA #IMPLIED
	 TYPE CDATA #IMPLIED
	 HREF CDATA #IMPLIED">
<!ENTITY % a.list "COMPACT CDATA #IMPLIED">

<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >
<!ENTITY % list "UL|OL|DIR|MENU">
<!ENTITY % pass "P|#PCDATA" -- aka pass_character -->
<!ENTITY % raw "XMP|LISTING">
<!-- PlainText is more than 8 characters, and changing the
     NAMELEN capacity involves using an SGML declaration different
     from the default, which is a hassle.

     Besides: the semantics of PlainText can't
     be captured by real SGML anyway.

     If we were willing to muck with NAMELEN, we could use
     the <PLAINTEXT> tag to mark the _end_ of the SGML document,
     and treat the rest of the data in the stream using normal
     plain text conventions.
 -->

<!--     Document structure       -->

<!ELEMENT HTML	O O  ((TITLE? & NEXTID? & ISINDEX?), DOCUMENT)>

<!ENTITY % body "%heading|%list|DL|%pass|%raw|address">
<!ELEMENT DOCUMENT O O ((%heading), (%body)+) +(A)>
<!-- The DOCUMENT element is necessary to avoid mixed content
     in the HTML element. Mixed content and optional elements
     don't mix very well.

     BUT it introduces minimization into the HTML format. Hmm...
 -->


<!ELEMENT TITLE	- -  (#PCDATA)>

<!ELEMENT ISINDEX - O EMPTY >
<!ELEMENT NEXTID - O EMPTY >
<!ATTLIST NEXTID ID NUMBER #REQUIRED>
<!-- as noted in Tags.html, the conventional <NEXTID 10> is
     illegal. Use <NEXTID ID=10> to comply with this DTD. -->

<!ELEMENT ADDRESS - O (%pass)+>

<!ELEMENT (%heading)	- -  (%pass)+
	--Tags.html says titles should fit on one line, but
	the browser handles paragraph breaks inside headings
	gracefully. -->

<!ELEMENT (%list) - -  ((LI|%pass)+)>
<!ATTLIST (%list) %a.list>

<!ELEMENT DL	- -  (DT|DD|%pass)+>
<!ATTLIST DL %a.list>

<!ELEMENT (LI|DT|DD)	- O  EMPTY>

<!ELEMENT A	- -  (%body)+>
<!ATTLIST A %a.a; >

<!ELEMENT P	- O  EMPTY>

<!ELEMENT (%raw) - -  CDATA>
<!-- BUG:
tags.html says that you can put anything but </XMP> in the
text of an XMP element. SGML says that ETAGO, "</" ends a CDATA
section.
-->


This is the very oldest surviving HTML DTD. As its comments reveal, older versions of the DTD were produced by him as early as mid-July. In this DTD one sees several of the planned improvements to HTML being shown, such as the TYPE attribute of <A>, the ID attribute of <NEXTID>, and the character entities for "<" and ">" and even one (never to be seen again) for a list bullet, "&#183;" (shows as ˇ), a clear reference to the ISO-8859-1 standard. Even this early he is pushing for a reacceptance of <OL> nearly a year before its eventual return. Elements for <MENU>, <DIR>, and <ISINDEX> are also present as if they had been always there. Note also the mention of the bug about how a <XMP> section would end with any end tag, according to proper SGML, but how HTML User Agents do (and should) only end it with its closing tag. Notice also how instead of calling the body the <BODY> it instead introduces a different tag titled <DOCUMENT>.

Last-Modified: Thu, 20 Aug 1992 00:28:16 GMT

http://www.w3.org/History/1992/WWW/Frame/fminit2.0/html.dtd


HyperText Markup: Recommended Usage

Recommended HTML Usage

These constructs should work even on pretty broken implementations.

Text Elements

Most text elements consist of a start tag, some content, and an end tag. A start tag is an identifier surrouded by angle brackets. An end tag is an open angle bracket, a slash, an identifier, and a close bracket.

An identifier should be a letter followed by up to 7 letters or numbers.

No spaces are allowed between the tag open bracket and the identifier. Space is allowed between the identifier and the close bracket.

Some elements are "empty" and consist of only a start tag.

Paragraphs are separated by the "P" element.

Six levels of headings are supported:

Level three heading

Level four heading

five
six

Unordered lists:

Ordered lists:

  1. This is the first item of an unordered list.
  2. This is the second item. It's kinda long, and should wrap around on most screens.
  3. This is the third item -- you know, the one with the P element.

  4. This is the fourth and final item.
term
definition
another term
and its definition

The address element indicates the author or source of the document.

DWC

connolly@convex.com

Normal Text: PCDATA

Normal text is represented in HTML as parsed character data, #PCDATA. The characters '<', '>', and '&' should be represented as "&lt;", "&gt;", and "&amp;" respectively, lest they be interpreted as markup. Lines should not exceed 72 characters. Line breaks have no significance except to separate words.

Literal Text: RCDATA

Sections of literal text are represented in HTML as replaceable character data. Line breaks are significant, and characters are rendered in a fixed-width font to preserve horizontal formatting.

This is literal text. THIS word
should line up under  THIS word.

There should be exactly three blank lines between here



and here.

The '&' character should be represented as "&amp;". The character sequence "</" must be represented as "&lt;/". The character sequence "]]>" must represented as "]]&gt;".

SGML tags look like <start> and &lt;/end>.
The marked section close delimiter looks like ]]&gt;.
But ]] is just two close square brackets, and
> is just a greater-than sign.

Document Description Elements

The TITLE element names the document. The content of the TITLE element is just character data, CDATA. It should be less than 72 characters, and it should contain no linebreaks, '<', '>', or '&' characters.

The ISINDEX tag appears at most one time, and it precedes all tags but TITLE and NEXTID.

Elements with Attributes

Some elements have associated named attributes. The values of the attributes of an element are specified in its start tag.

Attribute values are represented as RCDATA surrounded by double quotes. The character '"' must be represented as "&quot;" in an attribute value literal. The NEXTID tag appears at most one time, after the title and before the text elements.

<!-- test.html $Id$ -->
<TITLE>HyperText Markup: Recommended Usage</TITLE>

<H1>Recommended HTML Usage</H1>

These constructs should work even on pretty broken implementations.

<H2>Text Elements</H2>

Most text elements consist of a start tag, some content,
<!-- comment foo -->
and an end tag. A start tag is an identifier surrouded by angle
<? processing instruction >
brackets. An end tag is an open angle bracket, a slash, an
identifier, and a close bracket.  <P>

An identifier should be a letter followed by up to 7 letters
or numbers.  <P>

No spaces are allowed between the tag open bracket
and the identifier. Space is allowed between the identifier
and the close bracket.  <P>

Some elements are "empty" and consist of only a start tag.  <P>

Paragraphs are separated by the "P" element.  <P>

Six levels of headings are supported:  <P>

<H3>Level three heading</H3>
<H4>Level four heading</H4>
<H5>five</H5>
<H6>six</H6>

Unordered lists:  <P>

<UL>
<LI> This is the first item of an unordered list.
<LI> This is the second item. It's kinda long, and should wrap around
on most screens.  <P>
<LI> This is the third item. It's only one paragraph, but it's got
a paragraph tag at the end.<P>
<LI> This is the fourth and final item.
</UL>

Ordered lists:  <P>

<oL>
<LI> This is the first item of an unordered list.
<LI> This is the second item. It's kinda long, and should wrap around
on most screens.
<LI> This is the third item -- you know, the one with the P element.  <P>
<LI> This is the fourth and final item.
</oL>

<DL>
<DT> term
<DD> definition
<DT> another term
<dd> and its definition
</DL>

The address element indicates the author or source of the document.
<ADDRESS> DWC   <P>
connolly@convex.com
</ADDRESS>

<H2>Normal Text: PCDATA</H2>

Normal text is represented in HTML as parsed character data, #PCDATA.
The characters '&lt;', '&gt;', and '&amp;' should be represented as
"&amp;lt;", "&amp;gt;", and "&amp;amp;" respectively, lest they
be interpreted as markup. Lines should not
exceed 72 characters. Line breaks have no significance except to
separate words.  <P>

<H2>Literal Text: RCDATA</H2>

Sections of literal text are represented in HTML as replaceable
character data. Line breaks are significant, and characters are
rendered in a fixed-width font to preserve horizontal formatting.  <P>

<XMP>
This is literal text. THIS word
should line up under  THIS word.

There should be exactly three blank lines between here



and here.

</XMP>

The '&amp;' character should be represented as "&amp;amp;".
The character sequence "&lt;/" must be represented as "&amp;lt;/".
The character sequence "]]&gt;" must represented as "]]&amp;gt;".

<XMP>
SGML tags look like <start> and &lt;/end>.
The marked section close delimiter looks like ]]&gt;.
But ]] is just two close square brackets, and
> is just a greater-than sign.
</XMP>

<H2>Document Description Elements</H2>

The TITLE element names the document.
The content of the TITLE element is just character data, CDATA.
It should be less than 72 characters, and it should contain
no linebreaks, '&lt;', '&gt;', or '&amp;' characters.  <P>

The ISINDEX tag appears at most one time, and it precedes all tags
but TITLE and NEXTID.<P>

<H2>Elements with Attributes</H2>

Some elements have associated named attributes. The values of
the attributes of an element are specified in its start tag.<P>


Attribute values are represented as RCDATA surrounded by double
quotes.  The character '"' must be represented as "&amp;quot;" in an
attribute value literal.

The NEXTID tag appears at most one time, after the title and before
the text elements.<P>

Dan Connolly hand-wrote this file while Tim Berners-Lee was updating his NeXT HTML Editor. Notice it does not use his <TYPEWRITER> tag but does exercise the character entities. It also appears to be the first use of <OL> since the tag was done away with in very early 1991.

Last-Modified: Tue, 24 Nov 1992 14:48:03 GMT

http://www.w3.org/History/1992/WWW/SGMLStream/src/test.html


The following group of files were prepared by Dan Connolly on November 30, 1992, and not only capture the transition HTML was in at the point, but also show clear evidence of being generated by some other HTML editor than NeXT. But by this point the next version of NeXT was already in use by Tim Berners-Lee, though still there is much here in the way of SGML-friendly enhancements.


Hypertext Markup Language - Windows Internet Explorer

HyperText Markup Language

A Language for Transmission of Global Hyperdocuments.

Abstract

The World Wide Web project involves the processing of structured hypertext documents by diverse systems around the globe. The hypertext documents are represented as marked up text.

Specification

The HyperText Markup Language is defined in terms of the ISO 8879:1986, Standard Generalized Markup Language (SGML). The SGML declaration and document type definition specify the syntax and structure of HTML.

Implementors' Guide

This is intended as an introduction to the language and a guide to implementors. It does not comprise an integral part of the HTML specification.

Introduction

Text and Markup is an introduction to SGML text and markup as it applies to HTML. It should prepare you to read the DTD.

HTML by Example

The following sections describe the HyperText Markup language by example. They are organized in order of complexity, both for the human reader and the SGML processing application.

Recommended
Examples of how to write HTML that won't stress the processing software. Some things can't be done this way.
Complete
Examples of all the constructs necessary to produce HTML documents.
Tolerated
Examples of illegal constructs that are supported for historical reasons.
Deprecated
Some quirks; these are legal SGML, but they are likely to break existing implementations (including the sample).
Errors
These are just plain broken. Implementors should use these to bullet-proof their code.

A Partial Implementation

The libHTML software distribution provides the primitive SGML reading functions that you can use to build a conforming implementation.

This software is written in ANSI C (with some accomodataions for K&R compilers). It supports the lexical constructs demonstrated in HTML Extremes.

<TITLE>Hypertext Markup Language</TITLE>

<H1>HyperText Markup Language</H1>

<H2>A Language for Transmission of Global Hyperdocuments.</H2>

<H3>Abstract</H3>

The World Wide Web project involves the processing of structured
hypertext documents by diverse systems around the globe. The hypertext
documents are represented as marked up text.<P>

<H2>Specification</H2>

The HyperText Markup Language is defined in terms of the ISO
8879:1986, Standard Generalized Markup Language (SGML). The <A NAME=id8
HREF="html.dtd">SGML declaration and document type definition </A>
specify the syntax and structure of HTML.
<P>

<H2>Implementors' Guide</H2>

This is intended as an introduction to the language and a guide to
implementors. It does not comprise an integral part of the HTML
specification.
<P>

<H3>Introduction</H3>

<A HREF="Text.html">Text and Markup</A> is an introduction to SGML
text and markup as it applies to HTML. It should prepare you to read
<A NAME=id10 HREF="html.dtd" content-type="text/plain">the DTD</A>.
<P>

<H3>HTML by Example</H3>

The following sections describe the HyperText Markup language by
example. They are organized in order of complexity, both for the human
reader and the SGML processing application.
<P>

<DL>
<DT><A NAME=id2 HREF="recommended.html">Recommended</A>
<DD>Examples of how to write HTML that won't stress
the processing software. Some things can't be done
this way.

<DT><A NAME=id3 HREF="complete.html">Complete</A>
<DD>Examples of all the constructs necessary to
produce HTML documents.

<DT><A NAME=id4 HREF="tolerated.html">Tolerated</A>
<DD>Examples of illegal constructs that are supported
for historical reasons.

<DT><A NAME=id6 HREF="deprecated.html">Deprecated</A>
<DD>Some quirks; these are legal SGML,
but they are likely to break existing implementations (including
the sample).

<DT><A NAME=id7 HREF="errors.html">Errors</A>

<DD>These are just plain broken. Implementors should use
these to bullet-proof their code.

</DL>

<H2>A Partial Implementation</H2>

The <A NAME=id11 HREF="libHTML.tar.Z"
content-type="application/octet-stream">libHTML software
distribution</A> provides the primitive SGML reading functions that
you can use to build a conforming implementation.<P>

This software is written in ANSI C (with some accomodataions for
K&amp;R compilers). It supports the lexical constructs demonstrated in
<A NAME=id12 HREF="supported.html">HTML Extremes</A>.

This file has no <NEXTID> but it does have an attribute that is nowhere written of, namely a content-type. When, in HTML 4 and 4.01 the TYPE attribute resurfaces, it would be used exactly in the same manner as content-type was used here. Notice that all <A> NAME attributes here begin with the letters, id so they are no longer numbers. That is because in SGML if NAME is to be truly a name and not merely a number (as NeXT had been generating during the previous phase of HTML), then a mere number should no longer be accepted, but even so the next version of NeXT would continue to generate NAME values of merely a number as they did before.

Last-Modified: Mon, 30 Nov 1992 11:47:32 GMT

hhttp://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/MarkUp.html


HTML Guide: Text and Markup - Windows Internet Explorer

Text and Markup

This part of the HTML reference is an explanation of SGML syntax as it applies to HTML. For lexical issues, the purpose is to take the standard and reduce it from the abstract system that is SGML to a concrete language, HTML. For structural issues, the purpose is to give you enough background to read the DTD.

Structured Text

An HTML document is a hierarchy of elements. Each element has a name, some attributes, and some content. Most elements are represented in the document as a start tag, which gives the name and attributes, followed by the content, followed by the end tag. For example:

<HTML>
 <TITLE>
  A sample HTML document
 </TITLE>
 <H1>
  An Example of Structure
 </H1>
 Here's a typical paragraph.
 <P>
 <UL>
  <LI>
  Item one has an
  <A NAME=anchor>
   anchor
  </A>
  <LI>
  Here's item two.
 </UL>
</HTML>

Some elements (e.g. P, LI) are "empty." They have no content. They show up as just a start tag.

For the rest of the elements, the content is a sequence of data characters and nested elements. The content must match the element's model group from its declaration in the DTD.

Using the example from above, the content of the UL element is the sequence "LI, #PCDATA, A, LI, #PCDATA". This matches the model group from the UL element declaration: "(#PCDATA|LI|A)+".

Parsing Content Into Data and Markup

An HTML document is like a text file, except that some of the characters are interpreted as markup, rather than document content. The following table lists the special character sequences that separate data from markup in an HTML document.

SGML delimiters

CRO
Character Reference Open: "&#", when followed by a letter or a digit, signals a character reference. SGML idioms include things like "&#168;" and "&#SPACE;". It is not used in HTML.
ERO
Entity Reference Open: "&", when followed by a letter, signals an entity reference.
ETAGO
End Tag Open: "</", when followed by a letter, signals an end tag.
MDO
Markup Declaration Open: "<!", when followed by a letter or "--" or "[", signals one of several SGML markup declarations. The only purpose it serves in HTML is to introduce comments.
MSC
Marked Section Close: "]]", when followed by ">" signals the end of a marked section. While marked sections are not used by HTML, this sequence of characters is recognized and reported as an error by conforming SGML parsers.
PIO
Processing Instruction Open: "<?" signals a processing instruction. It is not used in HTML.
STAGO
Start Tag Open: "<", when followed by a letter, signals a start tag.

Normal Text: Parsed Character Data

In the DTD, the symbol PCDATA stands for parsed character data, the normal text characters in an HTML document.

The text consists of a stream of lines. The division into lines has no significance apart from indicating a word end.

All of the SGML delimiters listed in the table of delimitersare recognized in PCDATA.

Raw Text: Character Data

In the DTD, the symbol CDATA stands for character data, the text without markup in an SGML document. Only the end tag open delimiters is recognized in CDATA.

Tags

The characters in an SGML document are organized into a heirarchy of elements by the use of tags. Tags are set off from the data characters by angle brackets: '<' and '>'.

Names

The element name immediately follows "<". Names consist of a letter followed by up to 33 letters, digits, periods, or hyphens. Names are not case sensitive.

Attributes

Following the element name, whitespace and attributes are allowed. An attribute consists of a name, an equal sign, and a value. Spaces are allowed around the equal sign.

The value is either a token or a literal. A token is up to 34 letters, digits, periods, or dashes. Tokens are case sensitive.

A literal is a string surrounded by single quotes or a string surrounded by double quotes. Entity references are processed inside attribute values as inside PCDATA. The length of an attribute value (after entity processing) is limited to 1024 characters.

Each attribute has a type, which puts constraints on the values it can have. For example, the NAME attribute of the A element is an ID. An ID is a name that must be unique among all IDs in the document.

Entities

In order to include characters that would otherwise be parsed as markup, you can use entity references refer to some of characters.

An entity reference is an ampersand, followed by a name, followed by a semicolon. No spaces are allowed within an entity reference. For example:

This is how you include a &lt;tag&gt; as data.

Comments

Comment declarations can be used include information aimed at persons and tools that read the document in source form. This information will be ignored when the document is processed by an SGML parser.

Comments begin with the character sequence "<!--" and end with "--", which must be followed by '>'. (Technically, whitespace is allowed between the closing "--" and '>'.) They are only allowed in PCDATA.

<TITLE>HTML Guide: Text and Markup</TITLE>

<H1>Text and Markup</H1>

This part of <A NAME=id3 HREF="MarkUp.html">the HTML reference</A> is
an explanation of SGML syntax as it applies to HTML. For lexical
issues, the purpose is to take the standard and reduce it from the
abstract system that is SGML to a concrete language, HTML. For
structural issues, the purpose is to give you enough background to
read <A NAME=id1 HREF="html.dtd">the DTD</A>.
<P>

<H2>Structured Text</H2>

An HTML document is a hierarchy of elements. Each element has a name,
some attributes, and some content. Most elements are represented in
the document as a start tag, which gives the name and attributes,
followed by the content, followed by the end tag. For example:
<P>

<TYPEWRITER>
&lt;HTML>
 &lt;TITLE>
  A sample HTML document
 &lt;/TITLE>
 &lt;H1>
  An Example of Structure
 &lt;/H1>
 Here's a typical paragraph.
 &lt;P>
 &lt;UL>
  &lt;LI>
  Item one has an
  &lt;A NAME=anchor>
   anchor
  &lt;/A>
  &lt;LI>
  Here's item two.
 &lt;/UL>
&lt;/HTML>
</TYPEWRITER>

Some elements (e.g. P, LI) are "empty." They have no content. They
show up as just a start tag.
<P>

For the rest of the elements, the content is a sequence of data
characters and nested elements. The content must match the element's
model group from its declaration in <A NAME=id17 HREF="html.dtd">the
DTD</A>.<P>

Using the example from above, the content of the UL element is the
sequence "LI, #PCDATA, A, LI, #PCDATA". This matches the model group
from the UL element declaration: "(#PCDATA|LI|A)+".

<H2>Parsing Content Into Data and Markup</H2>

An HTML document is like a text file, except that some of the
characters are interpreted as markup, rather than document content.

The following table lists the special character sequences that
separate data from markup in an HTML document.

<H3><A NAME=delimiters>SGML delimiters</A></H3>

<DL>

<DT>CRO<DD>Character Reference Open: "&amp;#", when followed by a
letter or a digit, signals a character reference. SGML idioms include
things like "&amp;#168;" and "&amp;#SPACE;". It is not used in HTML.


<DT>ERO<DD>Entity Reference Open: "&amp;", when followed by a letter,
signals an <A NAME=id2 HREF="#Entities">entity reference</A>.

<DT><A NAME=ETAGO>ETAGO</A><DD>End Tag Open: "&lt;/", when followed by
a letter, signals an <A HREF="#Tags">end tag.
</A>

<DT>MDO<DD>Markup Declaration Open: "&lt;!", when followed by a
letter or "--" or "[", signals one of several SGML markup
declarations.  The only purpose it serves in HTML is to introduce <A
NAME=id11 HREF="#Comments">comments</A>.

<DT>MSC<DD>Marked Section Close: "]]", when followed by ">" signals
the end of a marked section. While marked sections are not used
by HTML, this sequence of characters is recognized and reported as an
error by conforming SGML parsers.

<DT>PIO<DD>Processing Instruction Open: "&lt;?" signals a processing instruction. It is not used
in HTML.

<DT>STAGO<DD>Start Tag Open: "&lt;", when followed by a letter,
signals a <A HREF="#Tags">start tag</A>.

</DL>

<H3><A NAME=PCDATA>Normal Text: Parsed Character Data</A></H3>

In <A NAME=id9 HREF="html.dtd">the DTD</A>, the symbol PCDATA stands
for parsed character data, the normal text characters in an HTML
document.
<P>

The text consists of a stream of lines. The division into lines has no
significance apart from indicating a word end.<P>

All of the SGML delimiters listed in <A NAME=id16
HREF="#delimiters">the table of delimiters</A>are recognized in PCDATA.
<P>

<H3><A NAME=CDATA>Raw Text: Character Data</A></H3>

In <A NAME=id15 HREF="html.dtd">the DTD</A>, the symbol CDATA stands
for character data, the text without markup in an SGML document. Only
the end tag open <A NAME=id14 HREF="#delimiters">delimiters</A> is
recognized in CDATA.
<P>


<H2><A NAME=Tags>Tags</A></H2>

The characters in an SGML document are organized into a heirarchy of
elements by the use of tags. Tags are set off from the data characters
by angle brackets: '&lt;' and '&gt;'.<P>

<H3>Names</H3>

The element name immediately follows "&lt;". Names consist of a letter
followed by up to 33 letters, digits, periods, or hyphens. Names are
not case sensitive.<P>

<H3>Attributes</H3>

Following the element name, whitespace and attributes are allowed. An
attribute consists of a name, an equal sign, and a value. Spaces are
allowed around the equal sign.<P>

The value is either a token or a literal. A token is up to 34 letters,
digits, periods, or dashes. Tokens are case sensitive.<P>

A literal is a string surrounded by single quotes or a string
surrounded by double quotes. Entity references are processed inside
attribute values as inside PCDATA. The length of an attribute value
(after entity processing) is limited to 1024 characters.<P>

Each attribute has a type, which puts constraints on the values it can
have. For example, the NAME attribute of the A element is an ID. An ID
is a name that must be unique among all IDs in the document.

<H2><A NAME=Entities>Entities</A></H2>

In order to include characters that would otherwise be parsed as
markup, you can use entity references refer to some of
characters.<P>

An entity reference is an ampersand, followed by a name, followed by a
semicolon. No spaces are allowed within an entity reference. For
example:<P>

<XMP>
This is how you include a &amp;lt;tag&amp;gt; as data.
</XMP>

<H2><A NAME=Comments>Comments</A></H2>

Comment declarations can be used include information aimed at persons
and tools that read the document in source form. This information will
be ignored when the document is processed by an SGML parser.<P>

Comments begin with the character sequence "&lt;!--" and end with
"--", which must be followed by '&gt;'. (Technically, whitespace is
allowed between the closing "--" and '&gt;'.) They are only allowed in
PCDATA.

This file actually used a <TYPEWRITER> tag, not as a mere demonstration of the tag, but actually. This tag is clearly the predecessor of the <PRE> tag, which is native to the version of HTML being used by Tim Berners-Lee starting only a few days previous to these HTML directions discussions posted by Dan Connolly. In this particular instance of the <TYPEWRITER> tag, only the opening "<" character was replaced with "&lt;." The closing ">" characters of the enclosed content were left as is. This is allowable since it is the sequence of the opening bracket followed by a letter that signifies a tag. See the explanation within this file of the "STAGO" SGML delimiter Most browsers are expected to be smart enough to recognize that a closing ">" not coming after the start of a tag is only the character itself and therefore to be displayed as is. But it is still always good programming practice to replace the closing ">" with "&gt;" wherever the closing bracket in a <PRE> section is meant to be simply displayed as is. For this display <TYPEWRITER> has been replaced with <PRE> and the closing brackets modified. Note here also some of the other SGML contructs such as Markup Declaration Open ("MDO"), Markup Section Close ("MSC"), and Processing Instruction Open ("PIO"). Some of the SGML constructions are also occasionally seen in HTML files, though even at this early period their use (apart from comment delimiters) is depreciated.

Last-Modified: Mon, 30 Nov 1992 12:05:57 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/Text.html


HTML Guide: Recommended Usage - Windows Internet Explorer

Recommended HTML Usage

This part of the HTML Reference shows recommended usage. These constructs are recommended because

This section contains many suggestions, rules of thumb, and the like. Where the suggestions are not equivalent to the DTD, the words "should," "may," etc. are linked to futher explanation.

Structure of an HTML document

An HTML document should start with a TITLE element.

If the document is searchable, an ISINDEX element should come next.

After any TITLE and ISINDEX elements comes the BODY, which should start with an H1 element, followed by other elements character data.

See also: tolerated structural errors, severe structural errors.

Header Elements

TITLE

The TITLE element should identify the document in a fairly wide context. Its content should fit on one line: it should be less than 72 characters with no linebreaks. It should not contain any '<' characters.

The title may be used to identify the node in a history list, to label the window displaying the node, etc. It is not normally displayed in the text of a document itself. Contrast titles with headings .

ISINDEX

The presence of the ISINDEX element indicates the document is searchable.

Body Elements

Within the content of these elements, the characters '<', '>', and '&' signal markup in many cases. They should be written as "&lt;", "&gt;", and "&amp;" respectively, to prevent this.

Anchors

A span of text can be marked as an anchor. Anchors can be used as the source of a hypertext link:

Choose this to view a neighbor document.

... or as the destination:

Fred Flinstone

See also: tolerated errors in anchors, severe errors in anchors.

Headings

Headings are used to break the body into sections and subsections. Several levels of headings are defined:

Level four headings are for sub-sub-sub headings

Paragraphs

Text that isn't marked up as some other element forms a paragraph.

Normal paragraphs consist of text consisting of words, sentences, and other stuff. Line breaks have no significance except to separate words. This is still the first paragraph of this section.

This is the second paragraph. Paragraphs are separated by P elements. HTML is relatively flat, and paragraph breaks are not allowed inside lists, headers, anchors, etc.

Lists

Glossaries

term
definition
another term
and its definition, which is long enough that it should wrap around on most screens.

Address

The address element indicates the author or source of the document:

DWC connolly@convex.com

TYPEWRITER

The TYPEWRITER element is used for characters that have already been formatted for a typewriter-like device. Markup is recognized in this element just as in the normal body paragraphs. But after processing tags and entity references, the data is displayed as on a typewriter, rather than using typesetting conventions.

Line breaks are significant, and characters are rendered in a fixed-width font to preserve horizontal formatting.

For example, a portion of a man page might look like:

NOTES
     cat is able to correctly access files larger that two giga-
     bytes in size.

SEE ALSO
     cp(1), ex(1), more(1), pr(1), tail(1)

Literal Text Elements

XMP and LISTING

These elements are used when you want to type the characters into the source document and have them show up in the output just like you typed them.

These elements act much like the TYPEWRITER element, but because markup is not recognized in their content, some character sequences can't be represented (SGML end tags, for example.) On the other hand, you don't have to meticulously mark up all the special characters.

You can draw pictures    /\
 in example elements    /  \
    see:                \__/
This is literal text. THIS word
should line up under  THIS word.

There should be exactly three blank lines between here



and here.

These elements are the source of the most errors in HTML implementations. They should be used only for simple examples that don't contiain SGML markup constructs.

<TITLE>HTML Guide: Recommended Usage</TITLE>

<H1>Recommended HTML Usage</H1>

This part of <A HREF="MarkUp.html">the HTML Reference</A> shows
recommended usage. These constructs are recommended because
<UL>
<LI>They conform to the SGML definition of HTML
<LI>They are straightforward to implement
<LI>They work on most existing browsers
</UL>

This section contains many suggestions, rules of thumb, and the like.
Where the suggestions are not equivalent to <A NAME=id1
HREF="html.dtd">the DTD</A>, the words "should," "may," etc. are
linked to futher explanation.

<H2>Structure of an HTML document</H2>

An HTML document <A NAME=id2 HREF="complete.html#structure">should</A>
start with a <A HREF="#TITLE">TITLE element</A>.<P>

If the document is searchable, an <A NAME=id3
HREF="#ISINDEX">ISINDEX</A> element should come next.<P>

After any TITLE and ISINDEX elements comes the BODY, which <A NAME=id4
HREF="tolerated.html#id1">should</A> start with an H1 element,
followed by other elements character data.<P>

See also: <A NAME=id13 HREF="tolerated.html#structure">tolerated
structural errors</A>, <A NAME=id14
HREF="errors.html#structure">severe structural errors</A>.

<H2>Header Elements</H2>

<H3><A NAME=TITLE>TITLE</A></H3>

The TITLE element should identify the document in a fairly wide
context. Its content should fit on one line: it should be less than 72
characters with no linebreaks. It <A NAME=id5
HREF="complete.html#TITLE">should
</A>
not contain any '&lt;' characters.
<P>

The title may be used to identify the node in a history list, to label
the window displaying the node, etc. It is not normally displayed in
the text of a document itself. Contrast titles with <A
HREF="#headings">headings </A>.

<H3><A NAME=ISINDEX>ISINDEX</A></H3>

The presence of the ISINDEX element indicates the document is
searchable.<P>

<H2>Body Elements</H2>

Within the content of these elements, the characters '&lt;', '&gt;',
and '&amp;' signal markup <A NAME=id12 HREF="Text.html#PCDATA">in many
cases</A>. They <A NAME=id6 HREF="supported.html#delimiters">should
</A>
be written as "&amp;lt;", "&amp;gt;", and "&amp;amp;" respectively, to
prevent this.

<H3>Anchors</H3>

A span of text can be marked as an anchor. Anchors can be used as the
source of a hypertext link:<P>

Choose <A HREF="tolerated.html">this</A> to view a neighbor
document.<P>

... or as the destination: <P>

<A NAME="Fred">Fred Flinstone</A><P>

See also: <A NAME=id15 HREF="tolerated.html#A">tolerated errors in
anchors</A>, <A NAME=id16 HREF="errors.html#a">severe errors in
anchors</A>.

<H3><A NAME=headings>Headings</A></H3>

Headings are used to break the body into sections and subsections.
Several levels of headings are defined: <P>

<H4>Level four headings are for sub-sub-sub headings</H4>

<H3>Paragraphs</H3>

Text that isn't marked up as some other element forms a paragraph.<P>

Normal paragraphs consist of text consisting of words, sentences, and
other stuff. Line breaks have no significance except to separate words.
This is still the first
paragraph of this section.
<P>

This is the second paragraph. Paragraphs are separated by P elements.
HTML is relatively flat, and paragraph breaks are not allowed inside
lists, headers, anchors, etc.<P>

<H3>Lists</H3>

<UL>
<LI>This is the first item of an unordered list.

<LI>This is the
second item. It's kinda long, and should wrap around on most screens.

<LI>This is the third item. It's only one paragraph, but it's got a
paragraph tag at the end.

<LI>This is the fourth and final item.
</UL>

<!-- @@ link to unordered lists -->

<H3>Glossaries</H3>

<DL>
<DT>term <DD> definition 

<DT>another term <DD> and its definition, which is long enough that it
should wrap around on most screens.
</DL>

<H3>Address</H3>

The address element indicates the author or source of the document:<P>

<ADDRESS>DWC
connolly@convex.com</ADDRESS>

<H3>TYPEWRITER</H3>

The TYPEWRITER element is used for characters that have already been
formatted for a typewriter-like device. Markup is recognized in this
element just as in the normal body paragraphs. But after processing tags
and entity references, the data is displayed as on a typewriter,
rather than using typesetting conventions.
<P>

Line breaks are significant, and characters are rendered in a
fixed-width font to preserve horizontal formatting.<P>

For example, a portion of a man page might look like:<P>

<TYPEWRITER>
NOTES
     cat is able to correctly access files larger that two giga-
     bytes in size.

SEE ALSO
     <A NAME=id7 HREF="man:/1/cp">cp(1)</A>, <A NAME=id8 HREF="man:/1/ex">ex(1)</A>, <A NAME=id9 HREF="man:/1/more">more(1)</A>, <A NAME=id10 HREF="man:/1/pr">pr(1)</A>, <A NAME=id11 HREF="man:/1/tail">tail(1)</A>
</TYPEWRITER>


<!-- @@ highlighting, character-level elements: bold, italic, etc. -->

<H2>Literal Text Elements</H2>

<H3>XMP and LISTING</H3>

These elements are used when you want to type the characters into the
source document and have them show up in the output just like you
typed them.<P>

These elements act much like the TYPEWRITER element, but because
markup is not recognized in their content, some character sequences
can't be represented (SGML end tags, for example.) On the other
hand, you don't have to meticulously mark up all the special characters.

<XMP>
You can draw pictures    /\
 in example elements    /  \
    see:                \__/
</XMP>

<XMP>This is literal text. THIS word
should line up under  THIS word.

There should be exactly three blank lines between here



and here.
 </XMP>

These elements are the source of the most errors in HTML
implementations. They should be used only for simple examples that
don't contiain SGML markup constructs.

Note here the explanation of <TYPEWRITER>, illustrating its new and novel ability to include links within their strictly formatted text, thus showcasing its superiority to <XMP> and <LISTING>. Though these latter two tags are still listed as being here among the "recommended" tags, already the use of these for anything more than small and simple samples is already being discouraged. After all, one can even show a closing </PRE> tag within a <PRE> section where you cannot have (to show) a closing </XMP> tag within a <XMP> section, or a closing </LISTING> tag showing within a <LISTING> section. Indeed, in some user agents, anything that looks like any sort of closing tag inside either a <XMP> or <LISTING> section ("ETAGO," End Tag Open) could terminate the <XMP> or <LISTING> section prematurely.

Last-Modified: Mon, 30 Nov 1992 11:57:09 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/recommended.html


HTML Guide: A Complete MarkUp Set - Windows Internet Explorer

A Complete Set of Constructs

The recommended usage is incomplete; it only includes those constructs that are easy to implement and explain. This section discusses a few more constructs that allow you to do anything that can legally be done. There are constructs beyond these, but they can all be reduced to constructs shown here.

Document Structure

An HTML document is a header part followed by a BODY element.

The header part consists of the TITLE, ISINDEX, and NEXTID elements which each appear zero or one time in any order. (see ISINDEX test, no title test)

The BODY start and end tags may be omitted. They will be inferred by SGML parsers. "Recommended Usage" is an example of this. This entity is an example of explicitly including the BODY tags.

The PLAINTEXT tag signals the end of the HTML text entity, and the beginning of a non-SGML data entity. (The format of the data is governed by the MIME text/plain content type.)

See Also:

Header Elements

TITLE

The title can have an '<' character, as long as it's not followed by a '/' and a letter. See the section on SGML delimiters in CDATA.

Body Elements

The normal text content of body elements may include several kinds of markup.

A comment that you shouldn't see: For copyrights, RCS keywords, etc.

processing instruction: lkjsdf If you've _got_ to stick TeX macros or something in there, use this. The sample implementation won't even tell you it's there, though.

Entity References

Entity references are recognized in normal body elements (anyplace #PCDATA appears in the DTD) and attribute value literals. See the Entities section of "Text and Markup" for more details. The HTML DTD defines the following entities for characters that might otherwise be parsed as markup:

HTML Entities

Name
Definition
lt
<
gt
>
amp
&
quot"
"
apos
'

ISO Latin-1 Characters

The HTML DTD references the public text "ISO 8879:1986//ENTITIES Added Latin 1//EN" to define entities for latin-1 characters, for example Gödel was a famous mathemetician.

Anchors

Order and Apperance of Attributes

name implied

HREF implied

HREF before name

Quotes In Attribute Values

In order to include quotes in the value of the content-type attribute, use "&quot;" and "&apos;" entity references: link to SGMLS software distribution with fancy content-type attribute

Note: Interpretation of Literals

Section 7.9.3 of the SGML standard states

For the SGML-impared, Ee is Entity End (like EOF); RS is '\n'; RE is '\r'; SEPCHAR is '\t' and SPACE is ' '.

Since to date there are no HTML attributes containing newlines or spaces, that is not much of an issue.

@@But replacement of literals is. For one thing, this creates an interaction between the syntax of URLs and SGML syntax. We could resolve this issue by removing '&' from the URL syntax.

Headings

Six levels of headings are defined:

Level four heading

Another level four heading. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.

Level five heading
Level six heading

Paragraphs

Normal paragraphs consist of text consisting of words, sentences, and other stuff. Line breaks are not significant. This is still the first paragraph of this section.

Here's the second paragraph. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.

A P tag isn't needed between a paragraph and some other element, like a heading.

Ordered lists

These are for things like lists of steps, where the order is significant.

  1. This is the first item of an unordered list.
  2. This is the second item. It's kinda long, and should wrap around on most screens.
  3. This is the third item.
  4. This is the fourth and final item.

Case of names is not significant: different cases

Case of names is not significant: both lower case

TYPEWRITER

Anything you could put on a typewriter (or an ASCII display
device, more precicesly) can be represented in a TYPEWRITER
element:

Tags: <start> </end>
Entity references: &lt; &amp;

Tables made from tabs:

col 1	col 2	col 3	col 4
1		3	4
	2	3	4
1	2	3	4

Plus, you can use hypertext links.

Linebreaks _are_ significant. There should be three blank lines from here



to here.

The ASCII Horizontal Tab (HT) character should be interpreted as the smallest positive nonzero number of spaces which will leave the number of characters so far on the line as a multiple of 8. Its use is not recommended however.

Literal Text Elements

Comment declaration as data follows:
<!-- this would be a comment in PCDATA. It's data in RCDATA. -->

Markup declaration as data follows:
<!this would be an markup delcaration, which would be an
error in PCDATA. It's data in RCDATA.>

Start tag follows:
<start> tags are fine!
& as long as it's not followed by a letter or '#', it's fine!
&# is even ok, unless it's followed by a letter or a number.

Tabs in XMP content:

This is literal text with tabs.		THESE	        words
should line up under			THESE		words.
<TITLE>HTML Guide: A Complete MarkUp Set</TITLE>

<BODY><H1><A NAME=top>A Complete Set of Constructs</A></H1>

The <A NAME=id1 HREF="recommended.html">recommended usage</A> is
incomplete; it only includes those constructs that are easy to
implement and explain. This section discusses a few more constructs
that allow you to do anything that can legally be done. There are <A
NAME=id2 HREF="supported.html">constructs beyond these</A>, but they
can all be reduced to constructs shown here.
<P>

<H2><A NAME=structure>Document Structure</A></H2>

An HTML document is a header part followed by a BODY element.<P>

The header part consists of the TITLE, ISINDEX, and NEXTID elements
which each appear zero or one time in any order. (see <A NAME=id3
HREF="structure1.html">ISINDEX test</A>, <A NAME=id5
HREF="structure2.html">no title test</A>)
<P>

The BODY start and end tags may be omitted. They will be inferred by
SGML parsers. <A NAME=id4 HREF="recommended.html">"Recommended
Usage"</A> is an example of this. This entity is an example of
explicitly including the BODY tags.
<P>

The PLAINTEXT tag signals the end of the HTML text entity, and
the beginning of a non-SGML data entity. (The format of the data
is governed by the MIME text/plain content type.)<P>

See Also:<UL>

<LI><A HREF="structure3.html">plaintext at the beginning of a
document</A>

<LI><A HREF="structure4.html">plaintext at the beginning of
the body</A>

<LI><A HREF="structure5.html">plaintext after the body</A>

<LI><A HREF="tolerated.html#id1">tolerated errors in
structure</A>.

<LI><A HREF="errors.html#structure">severe errors in
structure</A>.
</UL>


<H2>Header Elements</H2>

<H3><A NAME=TITLE>TITLE</A></H3>

The title can have an '&lt;' character, as long as it's not followed
by a '/' and a letter. See <A NAME=id10 HREF="Text.html#CDATA">the
section on SGML delimiters in CDATA</A>.

<H2>Body Elements</H2>

The normal text content of body elements may include several kinds of
markup.<P>

A comment that you shouldn't see: <!-- Your implementation is broken
if you see this.--> For copyrights, RCS keywords, etc.
<P>

processing instruction: <?bold lkjsdf > If you've _got_ to
stick TeX macros or something in there, use this. The sample
implementation won't even tell you it's there, though.<P>

<H3>Entity References</H3>

Entity references are recognized in normal body elements (anyplace
#PCDATA appears in the DTD) and attribute value literals.
See <A NAME=id11 HREF="Text.html#Entities">the Entities section of
"Text and Markup"</A> for more details.

The HTML DTD defines the following entities for characters that might
otherwise be parsed as markup:
<P>

<H4>HTML Entities</H4>

<DL>
<DT>Name
<DD>Definition

<DT>lt<DD>&lt;
<DT>gt<DD>&gt;
<DT>amp<DD>&amp;
<DT>quot"<DD>"
<DT>apos<DD>'
</DL>
 <P>

<H4>ISO Latin-1 Characters</H4>

The HTML DTD references the public text
"ISO 8879:1986//ENTITIES Added Latin 1//EN"
to define entities for latin-1 characters, for example G&ouml;del was a
famous mathemetician.

<H2>Anchors</H2>

<H3>Order and Apperance of Attributes</H3>

<A HREF="#top">name implied</A><P>

<A NAME=xyz>HREF implied</A><P>

<A HREF="#top" NAME=xyz1>HREF before name</A><P>

<H3>Quotes In Attribute Values</H3>

In order to include quotes in the value of the content-type attribute,
use "&amp;quot;" and "&amp;apos;" entity references:
<A NAME=id13 HREF="ftp://ifi.uio.no/pub/SGML/SGMLS/sgmls-0.8.tar"
 content-type="application/x-tar; name=&quot;sgmls-0.8.tar&quot;">link
to SGMLS software distribution with fancy content-type attribute</A>

<H4>Note: Interpretation of Literals</H4>

Section 7.9.3 of the SGML standard states<P>

<UL>
<LI>An attribute value literal is interpreted as an attribute value by
replacing references within it, ignoring Ee and RS, and replacing RE
or SEPCHAR with SPACE.
</UL>

For the SGML-impared, Ee is Entity End (like EOF); RS is '\n'; RE is
'\r'; SEPCHAR is '\t' and SPACE is ' '.<P>

Since to date there are no HTML attributes containing newlines or
spaces, that is not much of an issue.<P>

@@But replacement of literals is. For one thing, this creates an
interaction between the syntax of URLs and SGML syntax. We could
resolve this issue by removing '&amp;' from <A
HREF="http://info.cern.ch/hypertext/WWW/Addressing/BNF.html#xalpha">the
URL syntax</A>
.<P>

<H3>Headings</H3>

Six levels of headings are defined: <P>

<H4>Level four heading</H4>

<h4>Another level four heading. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.</h4>

<H5>Level five heading</H5>

<H6>Level six heading</H6>

<H3>Paragraphs</H3>

Normal paragraphs consist of text consisting of words, sentences, and
other stuff.
Line breaks are not significant.


This is still the first
paragraph of this section.

<P>

Here's the second paragraph. It's long. It's only conventional and suggested that lines be less than 72 characters long. It's certainly not specified, defined, or required.<P>

A P tag isn't needed between a paragraph and some other element, like
a heading.

<H3>Ordered lists</H3>

These are for things like lists of steps, where the order is
significant.

<OL>
<LI>This is the first item of an unordered list. 

<LI>This is the second item. It's kinda long, and should wrap around
on most screens.

<LI>This is the third item.

<LI>This is the fourth and final item.
</OL>

<h3>Case of names is not significant: different cases</H3>
<h3>Case of names is not significant: both lower case</h3>

<H3>TYPEWRITER</H3>

<TYPEWRITER>Anything you could put on a typewriter (or an ASCII display
device, more precicesly) can be represented in a TYPEWRITER
element:

Tags: &lt;start&gt; &lt;/end&gt;
Entity references: &amp;lt; &amp;amp;

Tables made from tabs:

col 1	col 2	col 3	col 4
1		3	4
	2	3	4
1	2	3	4

Plus, you can use <A NAME=id14 HREF="recommended.html">hypertext links.</A>

Linebreaks _are_ significant. There should be three blank lines from here



to here.</TYPEWRITER>

The ASCII Horizontal Tab (HT) character should be interpreted as the
smallest positive nonzero number of spaces which will leave the number
of characters so far on the line as a multiple of 8. Its use is not
recommended however.<P>

<H2>Literal Text Elements</H2>

<XMP>
Comment declaration as data follows:
<!-- this would be a comment in PCDATA. It's data in RCDATA. -->

Markup declaration as data follows:
<!this would be an markup delcaration, which would be an
error in PCDATA. It's data in RCDATA.>

Start tag follows:
<start> tags are fine!
& as long as it's not followed by a letter or '#', it's fine!
&# is even ok, unless it's followed by a letter or a number.
</XMP>

Tabs in XMP content:
<XMP>
This is literal text with tabs.		THESE	        words
should line up under			THESE		words.
</XMP>
</BODY>

Dan Connolly here proposes the restoration of <OL>, not seen anywhere outside his drafts for a DTD since the very first prototype HTML (dumped in mid-January 1991). The use of entities, and especially the ISO-8859-1 character entities is also advocated here, and illustrated with the "ö" character (the "ö" in "Gödel"), thus at last beginning to take other languages into account. The more specific advantages of each of <PRE> and <XMP> are each showcased in that one not only has links but also can (with character entities) show tags, while the other shows everything in the text, including comments and SGML markup structure text that would otherwise not show at all. Again we have here an instance of the content-type attribute of the <A> tag, again being used as TYPE would be used starting with HTML 4. This file also contains an instance of a "processing instruction" which is an SGML construct that can go within an HTML document. It starts with a "<?name " where name is some instruction (usually formatting) which is not seen by the parser but to be detected and responded to by the user agent. In this example the name is "bold" so I have treated the remaining contents of the processing instruction (to, but not including, its closing ">") as a bolded phrase, something not otherwise possible in this early version of HTML.

Last-Modified: Mon, 30 Nov 1992 11:59:17 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/complete.html


HTML Guide: Tolerated Errors - Windows Internet Explorer

Tolerating broken HTML writers

These are illegal according to SGML, but they're so prevalent that they're supported by the sample implementation.

Please stop generating HTML in this style!

Document Structure

The BODY element must start with some element. See: an example document where this rule is broken. Paragraph breaks are not allowed in headers, lists etc. They may be ignored or treated intelligently.

Muti-paragraph

heading

Unknown Tags

Tags that aren't known to the parser are treated as data by, for example, the MidasWWW-1.0 implementation. They should be ignored. There should be no tags around the word foo: foo.

Body Elements

Note that conforming SGML parsers will treat "&", "<", "</", and "<!" as normal text characters when they are not followed by a letter. HTML producers are discouraged from taking advantage of this feature.

Anchors

numeric IDs: NeXT and html-mode.el

This anchor's name starts with a digit, which is not a name start character.

unquoted attribute literals: NeXT and html-mode.el

This anchor's href contains a '#', which is not a name character. It should lead to the NeXT implementation reference below anyway. This anchor's href contains ':' and '/', which are not a name characters. It should lead to the SLAC MidasWWW doc anyway.

Literal Text Elements

Historical Note

The original semantics of the XMP and LISTING elements is not representable in SGML. From Tags used in HTML:

But in section 7.6 of the SGML standard:

The XMP and LISTING elements are deprecated in favor of the TYPEWRITER element.

Non-standard CDATA parsing: LineMode, MidasWWW, etc.

This example section ends here: 

Even though the above ETAGO begins a markup error, this text is in a normal paragraph in conforming implementations.

Just in case the foo close tag above wasn't recognized:

Known Implementations

The following systems are known to read and/or write HTML. They all have bugs.

Linemode Browser 1.3c
MidasWWW 1.0
The MidasWWW parses HTML into its internal data structures, and then offers the option to extract the data and write it to a file. It doesn't get it right all the time.
NeXT editor
From timbl@info.cern.ch
html-mode.el
from marca@@@
Viola
From Pei Wei @ O'Reilly (@@email address). Any known problems? I hear it's going to use SGMLs.
www_and_frame
@@Go get The latest version -- it should be current with this spec.
perl client
Just heard about it. haven't tried it. I don't think it supports entities.
<TITLE>HTML Guide: Tolerated Errors</TITLE>

<H1>Tolerating broken HTML writers</H1>

These are illegal according to SGML, but they're so prevalent that
they're supported by the sample implementation.<P>

Please stop generating HTML in this style!<P>

<H2><A NAME=id1>Document Structure</A></H2>

The BODY element must start with some element. See: <A
HREF="error_data_starts_body.html">an example document where this rule
is broken</A>.

Paragraph breaks are not allowed in headers, lists etc. They may be
ignored or treated intelligently.

<UL>
<LI>a list item<P>

with more than one paragraph
</UL>
<H3>Muti-paragraph<P>

heading</H3>

<H3>Unknown Tags</H3>

Tags that aren't known to the parser are treated as data by, for
example, the MidasWWW-1.0 implementation. They should be ignored.
There should be no tags around the word foo: <unknown>foo</unknown>.


<H2>Body Elements</H2>

Note that conforming SGML parsers will treat "&amp;", "&lt;", "&lt;/",
and "&lt;!" as normal text characters when they are not followed by a
letter. HTML producers are discouraged from taking advantage of this
feature.<P>

<H3><A NAME=a>Anchors</A></H3>

<H4>numeric IDs: <A
HREF="#NeXT">NeXT</A> and <A HREF="html-mode">html-mode.el</A>
</H4>

<A NAME=10>This</A> anchor's name starts with a digit, which is not a
name start character.<P>

<H4>unquoted attribute literals: <A
HREF="#NeXT">NeXT</A> and <A HREF="html-mode">html-mode.el</A>
</H4>

<A HREF=#NeXT>This anchor</A>'s href contains a '#', which is not a name
character. It should lead to the NeXT implementation reference below
anyway. <A
HREF=http://slacvx.slac.stanford.edu:80/midaswww/v10/overview.html>This
anchor</A>'s href
contains ':' and '/', which are not a name characters. It should lead
to the SLAC MidasWWW doc anyway.<P>

<H2>Literal Text Elements</H2>

<H4>Historical Note</H4>

The original semantics of the XMP and LISTING elements is not
representable in SGML. From <A
HREF="http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html">Tags used in
HTML</A>:
<P>

<UL>
<LI>The text may contain any ISO Latin printable characters, including
the tag opener, so long as it does not contain the closing tag in
full. </UL>

But in section 7.6 of the SGML standard:<P>

<UL>
<LI>The content of an element declared to be character data or
replaceable character data is terminated only by an etago
delimiter-in-context (which need not open a valid end-tag) ... .
</UL>

The XMP and LISTING elements are deprecated in favor of the TYPEWRITER
element.

<H4>Non-standard CDATA parsing: LineMode, MidasWWW, etc.</H4>

<XMP>
This example section ends here: </foo .

Even though the above ETAGO begins a markup error,
this text is in a normal paragraph in conforming implementations.<P>

<XMP>
Just in case the foo close tag above wasn't recognized:
</XMP>


<H2>Known Implementations</H2>

The following systems are known to read and/or write HTML. They all
have bugs.<P>

<DL>
<DT>Linemode Browser 1.3c
<DD>

<DT>MidasWWW 1.0<DD>

The MidasWWW parses HTML into its internal data structures, and
then offers the option to extract the data and write it to a file.

It doesn't get it right all the time.


<DT><A NAME=NeXT>NeXT editor</A>
<DD>From timbl@info.cern.ch

<DT><A NAME=html-mode>html-mode.el</A>
<DD>from marca@@@

<DT>Viola<DD>

From Pei Wei @ O'Reilly (@@email address). Any known problems? I hear
it's going to use <A NAME=id3
HREF="ftp://ftp.ifi.no/pub/text-processing/sgmls-0.8.tar"
 TYPE="application/x-tar">SGMLs</A>.

<DT>www_and_frame<DD>

@@Go get <A NAME=id5
HREF="ftp://info.cern.ch/pub/www/src/www_and_frame-0.3.tar.Z">The
latest version</A> -- it should be current with this spec.

<DT>perl client<DD>

Just heard about it. haven't tried it. I don't think it supports
entities.

</DL>


Despite the recommendation of using <XMP> seen in the above files, here its use is explicitly depreciated in favor of <PRE>. The fact that anything that looks like a closing tag would (perhaps should) end the literal text section is herein illustrated as a prime reason that <TYPEWRITER> (<PRE>) should be used instead, since this kind of problem cannot occur there. Note here the one time TYPE was used instead of content-type but for the same reason, anticipating HTML 4.

Last-Modified: Mon, 30 Nov 1992 11:59:32 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/tolerated.html


HTML Guide: Obscure Usage - Windows Internet Explorer

Deprecated Usage

These SGML constructs are too messy to support even in the sample implementation. But they are implemented by, for example, the SGMLs parser by James Clark. It is in direct conflict with the SGML standard not to support these, but tough cookies.

newline foo. marked sections ignore:

marked sections cdata: hideous stuff: </HTML id=#foo>

untermiated end tag
The start tag for this DL element is not terminated. By virtue of SHORTTAG YES in the SGML declaration, this is legal.
<TITLE>HTML Guide: Obscure Usage</TITLE>

<H1>Deprecated Usage</H1>

These SGML constructs are too messy to support even
in the sample implementation. But they are implemented
by, for example, the SGMLs parser by James Clark.

It is in direct conflict with the SGML standard not to
support these, but tough cookies.<P>

newline foo.

marked sections ignore: <![IGNORE[ hideous stuff: </HTML id=#foo> ]]><P>

marked sections cdata: <![CDATA[ hideous stuff: </HTML id=#foo> ]]><P>


<dl<dt>untermiated end tag
<DD>The start tag for this DL element is not terminated. By virtue of
SHORTTAG YES in the SGML declaration, this is legal.
</dl>

At this point however, the only thing "officially" depreciated is the use of non-HTML SGML constructs (other than <!-- Comments -->).

Last-Modified: Mon, 30 Nov 1992 11:59:57 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/deprecated.html


HTML Guide: Error Tests - Windows Internet Explorer

Illegal constructs

These are just plain broken. They're not legal SGML, and I don't know of any implementations that support them.

Document structure

Here's an anchor with a

paragraph break in it.

broken

headers

busted headers

Body Elements

Anchors

Sample anchor ID already in use I think this is a tag, since SHORTTAG YES is in the SGML decl: <>

<TITLE>HTML Guide: Error Tests</TITLE>

<H1>Illegal constructs</H1>

These are just plain broken. They're not legal SGML, and
I don't know of any implementations that support them.

<H2><A NAME=structure>Document structure</A></H2>

Here's <A NAME=id6 HREF="recommended.html">an anchor with a <P>

paragraph break</A> in it.

<H3>broken <H4>headers</H3>
busted headers</H4>

<H2>Body Elements</H2>

<H3>Anchors</H3>

<A NAME=xyz>Sample anchor</A>
<A NAME=xyz HREF="#xyz">ID already in use</A>

I think this is a tag, since SHORTTAG YES is in the SGML decl: <>
<P>

<gggggreeeeeeeeeeeeeeeeaaatbiglongnameofatagthatsreallyjustjunk>
</foo junk>

This file is only a demonstration of the degenerate sorts of things one might do if setting out to test the limits of user agents, purely code bulletproofing tools only.

Last-Modified: Mon, 30 Nov 1992 12:00:20 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/errors.html


HTML Guide: MarkUp Supported by the Library - Windows Internet Explorer

HTML Extremes

These are a little tricky, and might break some quick-and-dirty implementations. But they are parsed correctly by implementations based on libHTML.

These constructs are not recommended.

Document Structure

The tags for this element have spaces in them.

Another H4 Just in case it missed the close tag with spaces

Header Elements

Body Elements

Delimiter Recognition

Character reference (not used in HTML): ' ' and È. And character from data: &, and from markup: &.

And-hash from data: &# and from markup &#.

Less-thans as data: < <1 <-)

Less-than-slash as data: </ </1 </-)

greater-than (pretty much always data): > abc> 0>

comment: The sample implementation groks.

comment w/space between -- and >:

marked section close without mdc: ]]. processing instruction: broken impl The sample implementation treats it as a processing instrcution, so you don't see it.

Anchors

spaces around '='

single quoted value

character references and entity references in attribute value literal

<TITLE>HTML Guide: MarkUp Supported by the Library</TITLE>

<H1><A NAME=top>HTML Extremes</A></H1>

These are a little tricky, and might break some quick-and-dirty
implementations. But they are parsed correctly by implementations
based on <A NAME=id1 HREF="libHTML.tar.Z"
content-type="application/octet-stream">libHTML</A>.<P>

These constructs are not recommended.<P>

<H2>Document Structure</H2>

<H4    
   >The tags for this element have spaces in them.</H4			>
<h4>Another H4 Just in case it missed the close tag with spaces</H4>

<H2>Header Elements</H2>

<H2>Body Elements</H2>

<H3><A NAME=delimiters>Delimiter Recognition</A></H3>

Character reference (not used in HTML): '&#SPACE;' and &#200;.

And character from data: &, and from markup: &amp;.<P>
And-hash from data: &# and from markup &amp;#.<P>

Less-thans as data: < <1 <-)<P>

Less-than-slash as data: </ </1 </-)<P>

greater-than (pretty much always data): > abc> 0><P>

comment: <!-- implementation is broken if this shows up--> The sample
implementation groks.<P>

comment w/space between -- and &gt;: <!-- implementation is broken if 
this shows up -- 	><P> 

marked section close without mdc: ]].

processing instruction: <?bold broken impl > The sample implementation
treats it as a processing instrcution, so you don't see it.<P>

<H2>Anchors</H2>

<A HREF = "#top" NAME = xyz2>spaces around '='</A><P>

<A HREF='spec.html'>single quoted value</A><P>

<A HREF="system:cat&#SPACE;&gt;&#SPACE;file">character references
and entity references in attribute value literal</A><P>

</BODY>

This file is only a demonstration of the degenerate sorts of things one might do if setting out to test the limits of user agents, purely code bulletproofing tools only. And for the last time one more instance of content-type, this time for the libHTML link. Again, this sample includes yet another example of the processing instruction, and again it starts with <?bold , so again I have bolded its contents.

Last-Modified: Mon, 30 Nov 1992 12:09:35 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/supported.html


HTML tests: ISINDEX - Windows Internet Explorer


You can search this index. Type the keyword(s) you want to search for:


searchable document

This test is part of the complete HTML usage reference. It has to be a separate document because it's an example of document structure.

It contains a NEXTID element mostly for grins.

<TITLE>HTML tests: ISINDEX</TITLE>
<NEXTID ID=id1>
<ISINDEX>
<H1>searchable document</H1>

This test is part of <A NAME=id1 HREF="complete.html">the complete
HTML usage</A> reference. It has to be a separate document because
it's an example of document structure.<P>

It contains a NEXTID element mostly for grins.<P>

The oldest surviving example of <ISINDEX>, and regrettably this is only a non-functional demonstration file. It is not known, even a guess, how many functional <ISINDEX> instances actually existed at this point in time on the entire web. Probably not many, I would expect. Note here that the attribute to <NEXTID> is not N, but ID, and its value is a text that also starts with "id," consistent with the NAME values seen on nearly all the <A> tags in these files by Dan Connolly. This example of <NEXTID> is plainly hand-entered, not generated by any version of the NeXT Editor, and only included, as he puts it, "mostly for grins." Even so, it is incorrect in that it should be a value of one higher than the highest NAME value in the text, clearly an accident since he illustrates it correctly in another file.

Last-Modified: Mon, 30 Nov 1992 04:11:35 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/structure1.html


document sans TITLE

This test is part of the complete HTML usage reference. It has to be a separate document because it's an example of document structure.

<NEXTID ID=id1>
<H1>document sans TITLE</H1>

This test is part of <A NAME=id1 HREF="complete.html">the complete
HTML usage</A> reference. It has to be a separate document because
it's an example of document structure.<P>


This file is only a demonstration of the degenerate sorts of things one might do if setting out to test the limits of user agents, purely code bulletproofing tools only. Again, the <NEXTID> value is wrong. It was only included so as to have something of a "HEADER" section, and the only other two tags <ISINDEX> and <TITLE> were not possible here since one would have had other effects not wanted here and the other was specifically what was to be absent here for demonstration purposes.

Last-Modified: Mon, 30 Nov 1992 04:20:27 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/structure2.html


You'll get an error if you feed this stuff to the SGML parser.
It's not sgml. It's text/plain data.
<PLAINTEXT>
You'll get an error if you feed this stuff to the SGML parser.
It's not sgml. It's text/plain data.

This is yet another example of a degenerate sorts of file one might make only for setting out to test the limits of user agents, purely code bulletproofing tools only. This time there is only the <PLAINTEXT> tag and then all is raw text after that. One might as well have used a plain ASCII text file and given it a .txt suffix.

Last-Modified: Mon, 30 Nov 1992 05:03:44 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/structure3.html


HTML Tests: PLAINTEXT - Windows Internet Explorer

You'll get an error if you feed this stuff to the SGML parser.
It's not sgml. It's text/plain data.
<TITLE>HTML Tests: PLAINTEXT</TITLE>
<PLAINTEXT>
You'll get an error if you feed this stuff to the SGML parser.
It's not sgml. It's text/plain data.

And here is a file much like the one surviving <PLAINTEXT> file remaining, only a <TITLE> and then the <PLAINTEXT> portion, no body content, again an extreme and degenerate case, for code bulletproofing only.

Last-Modified: Mon, 30 Nov 1992 05:04:19 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/structure4.html


HTML Tests: PLAINTEXT - Windows Internet Explorer

Plaintext Tests

The SGML part of this document terminates at the PLAINTEXT tag. The rest is text/plain data. Don't feed it to the SGML parser.

You'll get an error if you feed this stuff to the SGML parser.
It's not sgml. It's text/plain data.
<TITLE>HTML Tests: PLAINTEXT</TITLE>
<H1>Plaintext Tests</H1>

The SGML part of this document terminates at the PLAINTEXT
tag. The rest is text/plain data. Don't feed it to the SGML parser.

<PLAINTEXT>
You'll get an error if you feed this stuff to the SGML parser.
It's not sgml. It's text/plain data.

This is a demonatration of a nominal <PLAINTEXT> file, complete with <TITLE> for the HEADER portion, a BODY portion started off with an <H1> header and continuing with some standard BODY text, and finally a <PLAINTEXT> section at the end, to the end of the file.

Last-Modified: Mon, 30 Nov 1992 05:05:10 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/structure5.html


HyperText Markup: Errors - Windows Internet Explorer

It is an error to start the body of an HTML document with data.

Error: Data not allowed at this point in BODY element

This test is part of the complete HTML usage reference. It has to be a separate document because it's an example of document structure.

<TITLE>HyperText Markup: Errors</TITLE>
<NEXTID ID=id2>
It is an error to start the body of an HTML document with data.

<H1>Error: Data not allowed at this point in BODY element</H1>
This test is part of <A NAME=id1 HREF="complete.html">the complete
HTML usage</A> reference. It has to be a separate document because
it's an example of document structure.<P>


Under the HTML of this era, one of the main cues of the transition from HEADER information to BODY information was the appearence of the distinctively BODY tag, <H1>. Once explicit <HEAD> (or <HEADER>) and <BODY> tags came into existence the distinction between HEADER and BODY became much more explicit and with it, more flexible, and not requiring a header (<H1>). In this file, the <NEXTID> value is one higher than the highest id value seen in any NAME in the text, so at last there is a correct example.

Last-Modified: Mon, 30 Nov 1992 04:28:17 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Connolly/error_data_starts_body.html


The following two files were prepared by Tim Berners-Lee using his revised NeXT HTML Editor. Notice the use of quotation marks for most parameters and the attribute N of the <NEXTID> tag, and even more important, the use of the <HEADER> tag (soon replaced with <HEAD>).


Future plans for HTML - Windows Internet Explorer

HTML directions

The HTML language has been in use in the field since 1990, and several suggestions have been made for improvements. See working notes . A new DTD will be the result.

Bad HTML

Much of the HTML actually around has been generated by the NeXTStep editor, which has in fact generated bad HTML. This should not confuse the specification. Some bugs in that output include non-matching open and close tags, and a NEXTID tag which is not SGML. Also, attribute values are not quoted even when they contain characters which require them to be quoted in SGML.

A perl script was written by Dan Connolly to clean up bad HTML.

Also, see Dan's HTML spec (draft) which contains a sort of test suite.

New features

Please mail me mentioning this list if you think of features I have missed out.

Header

A wrapper element for all the document-wide information such as title, document-wide links, etc. Advantage: You know when you have got to the end of it, and can open a window with the required attributes. This is easier than checking for a printable character.

Disadvantage: If mandatory, the size of the minimum document is increased.

A "Body" tag might be useful in the same light, for the rest.

Link

A document-wide link, as distinct from a localized anchor. Mainly useful in conjunction with interesting link types such as related-index, related-glossary, parent, author, print-with, copy-with, etc.

An empty element.

Atributes are as for the anchor element.

Dates

A tag giving the dates a document was created, modified and expired is going to be essential for caching systems.

The expiry date-time will allow long cache times for documents such as RFCs, and short or zero caching times for varing data.

<DATE CREATED="920630123067" EXPIRES="920706000000">

(Is there an SGML standard for datetimes? Which standard to use standard? HyTime?)

Highlighting

The HPx elements are not implemented. Some bold/italic/fixed width highlighting is useful, with equivalent representations on single font devices. Three possibilities are

Numbered HPn tags
These are rather meaningless. In practice, everyone has to remember which is bold and which is italic.
Logical tags.
Dan: "I'd prefer <em>, <tt>, <cite>, ala TeX. Or we could go with the O'Reilly/Hal DocBook tags: <Emphasis>, <OopsChar>, <wordasword>,<CiteBook>,<Subscript>, <Superscript>". A problem is there are never enough of them, so people reuse them on the understanding that they will be bold, etc.
Physical tags:
<Bold>, <italic> etc as in MIME. There would have to be an understanding that equivalent representations could be substituted where bold and italic are not available.

Base address

savedas
Could be a name for the tag to give the address with which the document was saved, so that relative links could be resolved even when a document is found out of context (like mailed).

Fixed width text with anchors etc

The XMP and LISTING elements have proved essential for putting on line text already formatted assuming a fixed-width character set. Many people have asked for a version which, instead of being oblivious to any embedded elements, added elements, ang and anchors withing the text. Line end would have to be mareked as such (with P) so that marked-up a line could be represented on many lines: the markup could make it too long to send as it was, and very inconvenient.

Note that an editor could always save in this element something which was originally loaded as a raw text section: indeed, the raw text is really only a (very useful!) way of importing text which could also go though a filter to make it valid marked up SGML.

Fixed width indented

Very often one wants to quote a command in fixed width font, but indented as a quotation, say 40 characters wide rather than 80. Perhaps the width required should be a parameter to the fixed width with anchors element. (Smacks of low-level format!)

Ordered list

Perhaps the OL tag ought to go back in, to distinguish the ordered list from the unordered one. Dan Conolly implements it.

Link types

There is a list of link types . We should formalize these, and then people actually could implement them. This corresponds to giving values to the TYPE attribute . This attribute cohis attribute coEL for RELATIONSHIP to avoid confusion between the type of link and the type of object to which it points.

Entities

A full set of entities for specical charecters should be defined, picked out of a suitable standard table. This should allow for accented characeters and bullets as a minimum. Representation using regular USASCII stand-ins (such as oe for o umlaut) should be allowed where the full character sets are not available. Editors must preserve entities even when the display has defaulted to a stand-in character combination.

Comments

The ability to hide information in an SGML document is useful. The COMMENT entity was introduced for this purpose in the line mode browser as an experiment. It should go in as standard in future. If it can contain anything then it can be used for commenting things out.

Tim BL
<HEADER>
<TITLE>Future plans for HTML</TITLE>
<NEXTID N="42">
</HEADER>
<BODY>
<H1>HTML directions</H1>The <A
NAME=2 HREF="MarkUp.html">HTML language</A> has been in use
in the field since 1990, and several
suggestions have been made for improvements.
See <A
NAME=1 HREF="../WorkingNotes/Overview.html">working notes</A> . A new <A
NAME=37 HREF="SGML.html#5">DTD</A> will
be the result.
<H2>Bad HTML</H2>Much of the HTML actually around
has been generated by the NeXTStep
editor, which has in fact generated
bad HTML. This should not confuse
the specification.   Some bugs in
that output include non-matching
open and close tags, and a NEXTID
tag  which is not SGML. Also, attribute
values are not quoted even when they
contain characters which require
them to be quoted in SGML.<P>
A<A
NAME=38 HREF="../Tools/HTMLGeneration/fix-html.pl"> perl script</A> was written by Dan
Connolly to clean up bad HTML.  <P>
Also, see Dan's <A
NAME=z41 HREF="Connolly/MarkUp.html">HTML spec</A> (draft)
which contains a sort of test suite.
<H2>New features</H2>Please mail me mentioning this list
if you think of features I have missed
out.
<H2>Header</H2>A wrapper element for all the document-wide
information such as title, document-wide
links, etc. Advantage: You know when
you have got to the end of it, and
can open a window with the required
attributes. This is easier than checking
for a printable character.<P>
Disadvantage: If mandatory, the size
of the minimum document is increased.<P>
A "Body" tag might be useful in the
same light, for the rest.
<H2>Link</H2>A document-wide link, as distinct
from a localized anchor. Mainly useful
in conjunction with interesting link
types such as related-index, related-glossary,
parent, author,  print-with, copy-with,
etc.<P>
An empty element.<P>
Atributes are as for the anchor element.
<H2>Dates</H2>A tag giving the dates a document
was created, modified and expired
is going to be essential for caching
systems.<P>
The expiry date-time will allow long
cache times for documents such as
RFCs, and short or zero caching times
for varing data.<P>
&lt;DATE CREATED="920630123067" EXPIRES="920706000000"><P>
(Is there an SGML standard for datetimes?
Which standard to use standard? HyTime?)
<H2>Highlighting</H2>The HPx elements are not implemented.
Some bold/italic/fixed width highlighting
is useful, with equivalent representations
on single font devices. Three possibilities
are
<DL>
<DT>Numbered HPn tags
<DD> These are rather
meaningless. In practice, everyone
has to remember which is bold and
which is italic.
<DT>Logical tags.
<DD> Dan: "I'd prefer &lt;em>,
&lt;tt>, &lt;cite>, ala TeX. Or we could
go with the O'Reilly/Hal DocBook
tags: &lt;Emphasis>, &lt;OopsChar>, &lt;wordasword>,&lt;CiteBook>,&lt;Subscript>,
&lt;Superscript>". A problem is there
are never enough of them, so people
reuse them on the understanding that
they will be bold, etc.
<DT>Physical tags:
<DD> &lt;Bold>, &lt;italic> etc
as in MIME. There would have to be
an understanding that equivalent
representations could be substituted
where bold and italic are not available.
</DL>

<H2>Base address</H2>
<DL>
<DT>savedas 
<DD>Could be a name for the tag
to give the address with which the
document was saved, so that relative
links could be resolved even when
a document is found out of context
(like mailed).
</DL>

<H2>Fixed width text with anchors etc</H2>The XMP and LISTING elements have
proved essential for putting on line
text already formatted assuming a
fixed-width character set. Many people
have asked for a version which, instead
of being oblivious to any embedded
elements, added elements, ang and
anchors withing the text. Line end
would have to be mareked as such
(with P) so that marked-up a line
could be represented on many lines:
the markup could make it too long
to send as it was, and very inconvenient.<P>
Note that an editor could always
save in this element something which
was originally loaded as a raw text
section: indeed, the raw text is
really only a (very useful!) way
of importing text which could also
go though a filter to make it valid
marked up SGML.
<H2>Fixed width indented</H2>Very often one wants to quote a command
in fixed width font, but indented
as a quotation, say 40 characters
wide rather than 80.  Perhaps the
width required should be a parameter
to the fixed width with anchors element.
(Smacks of low-level format!) 
<H2>Ordered list</H2>Perhaps the OL tag ought to go back
in, to distinguish the ordered list
from the unordered one. Dan Conolly
implements it.
<H2>Link types</H2>There is a <A
NAME=39 HREF="../DesignIssues/LinkTypes.html">list of link types</A> . We
should formalize these, and then
people actually could implement them.
This corresponds to giving values
to the <A
NAME=40 HREF="Tags.html#21">TYPE attribute</A> .  This attribute
cohis attribute coEL for RELATIONSHIP
to avoid confusion between the type
of link and the type of object to
which it points.
<H2>Entities</H2>A full set of entities for specical
charecters should be defined, picked
out of a suitable standard table.
This should allow for accented characeters
and bullets as a minimum.  Representation
using regular USASCII stand-ins (such
as oe for o umlaut) should be allowed
where the full character sets are
not available.  Editors must preserve
entities even when the display has
defaulted to a stand-in character
combination.
<H2>Comments</H2>The ability to hide information in
an SGML document is useful. The COMMENT
entity was introduced for this purpose
in the line mode browser as an experiment.
It should go in as standard in future.
If it can contain anything then it
can be used for commenting things
out.
<ADDRESS><A
NAME=0 HREF="http://info.cern.ch./hypertext/TBL_Disclaimer.html">Tim BL</A></A>
</ADDRESS></BODY>

This file shows many other new directions that HTML is taking. Tim Berners-Lee here accuses the NeXT Editor of generating bad HTML, which he is correct about. Those excess trailing </A> tags seen all over the place, the lack of quotes around HREF values that include characters that trip up SGML parsing, the use of a mere number as the "attribute" of <NEXTID>, and so forth. He then talks about a <HEADER> tag which this file itself already has, together with the possibility of a <BODY> tag, which it also has. He next talks about <LINK> which would come about with the DTD that Dan Connolly would generate in January 1993, and proposes a <DATE> tag, which would in time appear as a possible set of parameters to a new tag introduced in HTML known as <META>. At last the Highlighted phrase tags begin to recieve some attention as he considers implementing them not by simply using <HPn> but actually fleshing it out into the various text formatting tags such as <EM>, <CITE>, as well as some "physical tags" which would eventually lead to <B> and <I>. He talks about inculding a Base Address (<BASE>), a new kind of <XMP> tag which would allow some HTML functions such as links or other embedded elements, which would eventually merge with Dan's <TYPEWRITER> tag to become <PRE>. He talks about restoring <OL> and populating the TYPE attribute of <A> with possible values, and finally entities (leading to ISO-8859-1?) and comments, which were already coming about straight from the SGML. All in all, this is one of the most prophetic files in this entire suite!

Last-Modified: Wed, 02 Dec 1992 18:35:56 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Future.html


HyperText Design Issues: Link types - Windows Internet Explorer

Link Types

See discussion of whether links should be typed .

Descriptive (normal) link types are mainly for the benefit of users and tracing, and graphics representation algorithms. Some link types for example express relationships between the things described by two nodes.

A Is part of B / B includes A

A Made B / B is made by A

A Uses B / B is used by A

A refers to B / B is referred to by A

Magic link types

These have a significance known to the system, and may be treated in special ways. Many of these relate whole nodes, rather than particular anchors within them. (See also multiended links and predicate logic) Suggestions:

UseIndex

The destination is the related index for a search by a user reading this document who asks for an index search function.

A document may have any number of index links, causing several indexes top be searched in a client-defined manner.

UseGlossary

The destination of the link is an index which should be used to resiolve glossary queries in the document. (Typically, a double-clik on a word which is not within an anchor).

A document may have any number of glossary links.

Annotation

The information in the destination node is additional to that in the source node, and may be viewed at the same time. It may be filtered out (as a function of author?).

Annotation is used by one person to write the equivalent of "margin notes" or other criticism on another's document, for example.

Tracing may ignore annotations when generating trees or sequences.

Embedded information

If this link is followed, the node at the end of it is embedded into the display of the source node. This is supported by Guide, but not many other systems. It is used, in effect, by those systems (VAX/notes under Decwindows, Microsoft Word) which allow "Outlining" -- expanding a tree bit by bit.

The browser has a more difficult job to do if this is supported.

person described by node A is author of node B

This information can be used for protection, and informing authors of interest, for sending mail to authors, etc.

person described by node A is interested in node B

This information can be used for informing readers of changes.

Node A is in fact a previous version of node B

Node A is in fact a set of differences between B and its previous

version. This information will probably not be stored as nodes, but be generated from regular diff files. or some other delta method.

<HEADER>
<TITLE>HyperText Design Issues: Link types</TITLE>
<NEXTID N="5">
</HEADER>
<BODY>
<H1>Link Types</H1>See <A
NAME=3 HREF="Topology.html#4">discussion of whether links should
be typed</A> .<P>
Descriptive (normal) link types are
mainly for the benefit of users and
tracing, and graphics representation
algorithms. Some link types for example
express relationships between the
things described by two nodes.<P>
A Is part of B  / B includes A<P>
A Made B / B is made by A<P>
A Uses B  / B is used by A<P>
A refers to B / B is referred to
by A
<H2>Magic link types</H2>These have a significance known to
the system, and may be treated in
special ways.  Many of these relate
whole nodes, rather than particular
anchors within them.  (See also <A
NAME=4 HREF="Topology.html#12">multiended
links</A> and predicate logic) Suggestions:
<H2>UseIndex</H3>The destination is the related index
for a search by a user reading this
document who asks for an index search
function.<P>
A document may have any number of
index links, causing several indexes
top be searched in a client-defined
manner.
<H2>UseGlossary</H3>The destination of the link is an
index which should be used to resiolve
glossary queries in the document.
(Typically, a double-clik on a word
which is not within an anchor).<P>
A document may have any number of
glossary links.
<H2>Annotation</H3>The information in the destination
node is additional to that in the
source node, and may be viewed at
the same time. It may be filtered
out (as a function of author?).<P>
Annotation is used by one person
to write the equivalent of "margin
notes" or other criticism on another's
document, for example.<P>
<A
NAME=2 HREF="TracingLinks.html">Tracing</A> may ignore annotations when
generating trees or sequences.
<H2>Embedded information</H3>If this link is followed, the node
at the end of it is embedded into
the display of the source node. This
is supported by Guide, but not many
other systems.  It is used, in effect,
by those systems (VAX/notes under
Decwindows, Microsoft Word) which
allow "Outlining" -- expanding a
tree bit by bit.<P>
The browser has a more difficult
job to do if this is supported.
<H2>person described by node A is author
of node B</H2>This information can be used for
protection, and informing authors
of interest, for sending mail to
authors, etc.
<H2>person described by node A is interested
in node B</H2>This information can be used for
informing readers of changes.
<H2><A
NAME=1>Node A is in fact a previous version
of node B</A></H2>
<H2>Node A is in fact a set of differences
between B and its previous</H3>version. This information will probably
not be stored as nodes, but be generated
from regular diff files. or some
other delta method.</BODY>

This file describes the suggested possible values might be entered into the TYPE attribute of <A> as obvious enhancements to the system. Looking at how vague the descriptions are here, it is obvious that TYPE has not been actually used as yet, hence my omission of TYPE in my listing of tags and attributes for the first documented version of HTML. In time these relations would lay an early basis for the REL and it's counterpart, REV, of <A> and <LINK>.

Last-Modified: Thu, 03 Dec 1992 10:11:53 GMT

http://www.w3.org/History/19921103-hypertext/hypertext/WWW/DesignIssues/LinkTypes.html


HTML Guide: Markup Supported by libHTML

HTML Extremes

These are a little tricky, and might break some quick-and-dirty implementations. But they are parsed correctly by implementations based on libHTML-930106.tar.Z, available from the WWW code archives. These constructs are not recommended.

Document Structure

The tags for this element have spaces in them.

Another H4 Just in case it missed the close tag with spaces

Header Elements

Body Elements

Delimiter Recognition

Character reference: ' ' and È. And character from data: &, and from markup: &.

And-hash from data: &# and from markup &#.

Less-thans as data: < <1 <-)

Less-than-slash as data:

greater-than (pretty much always data): > abc> 0>

comment: The sample implementation groks.

comment w/space between -- and >: