The Lost Attributes of A and LINK: URN and METHODS

The <A> element (or "tag") has got to be the most famous element in all of HTML. Not only does its name place it at the beginning of any alphabetically organized reference of all the HTML elements (so anyone reading straight through one will hit this element first), but far more importantly, <A> is the one element which by far has done the most to place HTML (and even use of the internet itself, as something for everyone instead of just Rocket scientists, Defense contractors, and University professors) on the map. More than any other element, <A> is the heart of HTML, since it transformed the whole nature of human use of the internet. Originally intended to provide ready access from an article to other articles cited in footnotes in academic and scientific research papers, <A> is the whole basis for what we now call "net-surfing." It is the "link," something one places the mouse pointer over and clicks on in order to call up whatever it points to.

Obviously, such a massively useful HTML element couldn't possibly be showing any signs of obsolescence, so of course one could hardly expect that anything attached to it would not only go the way of such obsolete elements as <PLAINTEXT>, <NEXTID>, <XMP>, and <LISTING>, but do so far more quietly, and with such utter finality. Most of those other obsolete elements continue to be implemented in most user agents (except <NEXTID>), but attributes "URN" and "METHODS" of the <A> element, listed in HTML 2 (and still present in the working drafts of HTML 3.0) disappeared in HTML 3.2, never to be seen or heard from again.

This file illustrates the intended use of the URN and METHODS attributes of <A> and <LINK> as they were intended to be used as defined in HTML version 2. These attributes were designed to provide more options for more finesse in using the HREF attribute or function, i. e. the linking feature of HTML. These attributes were not actually depreciated, but as no user agent seems to have utilized them their deletion happened quite painlessly. They are defined only in HTML 2, but current and acceptible in all four forms of HTML 2. The four basic varieties of HTML 2 which can be validated are:

HTML Version 2 Level 2: This is the default and includes and permits all HTML Level 2 functions and elements and attributes
HTML Version 2 Strict Level 2: This excludes the older depreciated elements and also forbids such constructs as nesting a header (<H*> element) within a link (<A> element), or having a forms <INPUT> element which is not within a block level element such as <P>
HTML Version 2 Level 1: This is like the level 2 default but it excludes all the forms elements, i. e. <FORM>, <INPUT>, <TEXTAREA>, <SELECT>, and <OPTION>
HTML Version 2 Strict Level 1: This is like regular Level 1 but it also excludes the older depreciated elements and also forbids such constructs as nesting a header (<H*> element) within a link (<A> element)

Let's Start With URN

URN stands for Uniform Resource Name, at least as it finally evolved before being abandoned. There were some internal discussions early on in which the "N" stood for Number instead of Name. This was meant to be some more permanent locating device than the more conventional URL (Uniform Resource Locator). Another expression commonly seen in the early web standards documentation is URI which stands for Uniform Resource Identifier. What is the difference between these? The URI is the most global form of these expressions since it refers to any sort of any one of these short character strings which provide information needed by the browser and server to track down the requested web resource. Originally, it was intended that URI's would come in at least two, maybe three or more forms, namely the URL (which we have today), the URN (which we are discussing here), and at least one other form, the URC, or Uniform Resource Characteristic, which seems to have received at least some consideration, though not very much. So, a URL, or a URN or URC (were any of the latter types to come into existence) would be among the different types of URI available.

It was hoped that some sort of stable master archive of scientific and scholarly research articles would be established, or if not, then at least there would be a system that such articles, to be kept always somewhere on the web, would always be available using the same URN. Recall, too, that at the time of formulating HTML 2 the whole nature of mirror sites was still as yet not readily settled. Perhaps if some article is not available from this server it might still be available from that one. But a URL simply goes to a specific place where it was found, and if for any reason it isn't there anymore you get a dead link. Another concern was that if it was available in multiple places, it would also be smart enough to find the closest place in which the needed article resided, or perhaps was even cached, and deliver that copy instead of a further away copy, so as to save on network bandwidth. For saving bandwidth was also a consideration, and if a document's URN value were one you already had on file, the network would be spared having to transfer it to you again (a crude sort of caching), since the document would have one single URN value associated with it no matter where it is hosted. Some mid-1993 discussions seem to suggest this use.

Since an URN was a rather specific kind of service, it had a much more specific form of appearence, while URL's came (at least back then) in so many different forms including http:, ftp:, mailto:, gopher:, wais:, telnet:, news:, and so many others. An URN on the other hand had one start, namely urn: (lower and upper case letters were considered equivalent for the "urn" portion). Some early suggestions included treating it like a book's ISBN number, hence the idea of N meaning Number instead of Name, or else like an internet server address (remember those funny numbers with four parts to them separated by dots that look like 207.217.96.28?), or else eventually a name (of sorts). Typical URN's might have looked like this: "urn:ietf:params:language:en-us" or "URN:foo:a123,456." The first part of course is the letters URN (any combination of upper and lower case), followed by the : and the next portion ("ietf" or "foo" in the above examples) would be called the Namespace Identifier (NID) which might refer to a particular URN server or service, again followed by another : and then follows the Namespace Specific String (NSS) which would provide whatever information was needed by the NID-named URN service to know which document to look for. For example, the "params:language:en-us" portion of the ietf URN would provide the ietf server with a parameter "params:" with a name and value pair of "language:en-us" to specify the English language version of their site.

At such an early point it may not have been clear whether a user agent would somehow have to handle an URN differently from how it would handle the more conventional and common URL, but one other advantage to having a separate attribute would be to serve as a backup. Indeed, there is no reason that a user agent could not have simply looked at the opening four characters of its URI to determine if it were a URN or not, and proceed from there, but furnishing a backup link was a great idea unfortunately not picked up on by any user agent designer. I think a real potential was overlooked here.

The eventual goal would have been that the writer of an article or web resource provides the standard URL to the article in the HREF attribute, and if his article or resource is important enough to be worth permanent archival using the URN service, he would also place the URN in the URN attribute. That way, if someone reading his article and wishing to follow one of the footnoted leads to some other article or resource is unable to find the article at the place specified by the URL, the user agent could simply try the URN next so as to produce the article.

A typical use of the URN would have looked like thus:

<P>See the <A HREF="http://www.ietf.org" URN="urn:ietf:params:language:en-us">IETF Site</A> for details.</P>

And this results in:

See the IETF Site for details.

For further examples and explanaions of the URN attribute, see here and here.

So, in the event that http://www.ietf.org is unable to be located (due to server down, etc.), the URN, urn:ietf:params:language:en-us would be used instead. The user agent, getting a 400-something or 500-something error code from the HTTP (e. g. Error 404 File Not Found), would automatically go back again using the URN and presumably this time find it.

Given how simple it is to determine if a URI is an URN or not (just look at the first four characters for some variation of "urn:" or "URN:" or "uRn:" etc.), if there were anything different a user agent must do in handling an URN, it could be done automatically, allowing both HREF and URN to function as generic URI address locations, prime and backup. So even had the URN service never come on line (as in fact it never did), the URN attribute could still have been useful as a backup link, even if it only had a conventional URL in it (which the user agent would readily detect). A scholarly article could point to some footnoted reference with its last known URL in the HREF field, and a pointer to a local site page pointed to by the URN field, which might feature an abstract of the article, a list of mirror locations it might also be at, or even a search phrase that a search engine would reasonably find only in the desired article, so as to provide assistance in finding the desired article or resource in the event the URL is no longer valid.

Even in this more primitive sense, I know of no user agent which ever employed the URN attribute despite its evident potential usefulness in this secondary manner. In case any user agent out there just might implement it, you can test it with the following two links:

<A URN="urn:ietf:params:language:en-us">Valid urn: example</A>

Valid urn: example

Or try its use as a redundant link, thus:

<A URN="http://www.ietf.org">backup url: example</A>

backup url: example

Or try one with an invalid HREF but a valid URL in the URN field:

<A HREF="http://www.the-pope.com/invalid.html" URN="urn:ietf:params:language:en-us">backup url: example 2</A>

backup url: example 2

This should at least make it look like a link, but unless your user agent can detect the URN attribute (and what is in it) you will get an "HTTP 404" ("Not Found") error. What a missed opportunity!

And Now for METHODS

HTTP (the Hyper Text Transfer Protocol) includes a number of "methods" by which a page or resource (server, or file or file space on a server, for example) can be accessed. RFC 2616 lists some eight methods of access. Some, such as GET and POST, are well known to those who have worked with HTML forms. However, in addition to those, there are HEAD, OPTIONS, PUT, DELETE, TRACE, and CONNECT. When applied to a <A> or <LINK> element, the proper default is "GET." This causes the user agent to transmit the URI (contents of the HREF string), and the server which has the requested file or program returns the file or program output along with a header consisting of several lines specifying its length, character set, whether text, html, graphics, or whatever, and so forth.

Had the METHODS attrribute been implemented in any user agent, one would be able to request something different. For example, suppose one only wanted to verify that a file is present, but not to download its contents. Then a METHODS of "HEAD" would tell the remote resource to return only the header information but not the contents of the file itself. POST, by its very nature, can only go with some way of entering data to send along with the URI data request, e. g. form entries. PUT similarly would not apply to <A> or <LINK> but gets used by FTP programs for uploading files onto a server. DELETE is similarly used by FTP to delete files no longer wanted on the server. Ovbiously, these methods also have no possible application to <A> or<LINK>. I suppose that TRACE and OPTIONS just might have been possible from <A> and <LINK>, but as these are also not implemented anywhere we won't be seeing any way to make a link get echoed from the remote server (TRACE) nor learn which options are available (OPTIONS).

HTML 2 allows for links to feature more than one method, hence the name of the attribute being "METHODS" instead of "METHOD" as it is in form elements. The methods (if more than one) are supposed to be separated by white space within the attribute field, so for example if a GET on the file is not possible, then perhaps a HEAD will do, and if failing that, an OPTIONS list returned to see just what exactly is possible:

<A HREF="http://www.the-pope.com/hpn.html" METHODS="GET HEAD OPTIONS">METHODS example</A>

METHODS example

This demonstration file also includes in its <HEAD> section the following four <LINK> examples:

<LINK REL=prev HREF="hpn.html" METHODS="GET OPTIONS TRACE">
<LINK REL=next HREF="nextid.html" METHODS="GET OPTIONS TRACE">
<LINK REL=appendix HREF="invalid.html" URN="urn:foo:a123,456" METHODS="DELETE GET HEAD">
<LINK REL=appendix HREF="http://www.ietf.org" URN="urn:ietf:params:language:en-us" METHODS="GET HEAD">

Recommended Implementation

The URN really is a great idea, and can only hope that work on it may one day resume, now that the nature of the web is more fully formed and the value and use for such a thing can at last be appreciated. However, since an URN is easily distinguished by its prefix, there is no need for a special link for it; it can be recognized as such whether it comes in the HREF or URN attribute. Since an URN would have a more indirect and complicated path to take, the direct URL would be the first preferred sort of URI to place in the HREF attribute, and a secondary pointer, be it an URN or merely a different URL, could be put in the URN attribute. So a user agent should attempt to fetch the HREF destination, and if failing, should then simply fetch the URN destination. Web masters would be advised to point the URN to either a real URN (if any ever comes to exist), some more remote mirror site, or even a special page on their own site containing the data (if they have permission and the space to host it), some summary or abstract of the data, or at least some search string distinctive to the article pointed to, and which one could feed to a search engine to see if it exists anywhere else on the web. Perhaps such search strings could serve in a manner similar to the intention of the URC.

METHODS really should be used, and the result displayed, providing that they are valid methods for a link, namely GET, HEAD, TRACE, or OPTIONS. GET would be the default and would simply work as links do presently, but if one enters HEAD, TRACE, or OPTIONS, one gets instead a simple text display of whatever response the user agent gets from the server. When more than one appears (with either spaces or commas as delimiters), the first is attempted, and if not successful the second, and so forth until one succeeds. For example, "GET TRACE" would first attempt to fetch the file like a normal link, but failing that would then attempt a TRACE and then display the results of that. One thing that would be particularly useful would be to specify HEAD as either a quick and hasty link check (a webmaster might simply want a quick way to see if his links are still valid without having to download their whole file text and graphics etc.), or as a way to view the HTTP header information on a file. TRACE and OPTIONS are of less use, but these should also be passed to the web in the link request instead of the usual GET. A <LINK METHODS="HEAD"> command could pass the return status to a scripting language, for example, for the viewer to know in advance if a particular link might be unavailable at the time without having to click it and wait for the user agent to time out and let him go back.

Upgrades and Downgrades

Possible downgrades are:

None - Both of these attributes are ancient and have no specific equivalent in anything previously acceptable to any HTML machine. However, in an earlier draft of HTML, multiple methods listed were to be separated with commas instead of white space.

Possible upgrades are:

Scripting languages - perhaps it might just be possible for a scripting language to be poserful enough to respond gracefully to a resource not found. At this point I have no clear evidence that this is even possible.

This file, "aatrib.html," is HTML 2.0 Strict Level 1 compliant.

Next Level Up