The <A>
element (or "tag") has got to be the most
famous element in all of HTML. Not only does its name place it at the beginning of
any alphabetically organized reference of all the HTML elements (so anyone reading
straight through one will hit this element first), but far more importantly,
<A>
is the one element which by far has done the most to place
HTML (and even use of the internet itself, as something for everyone instead of just
Rocket scientists, Defense contractors, and University professors) on the map. More
than any other element, <A>
is the heart of HTML, since it
transformed the whole nature of human use of the internet. Originally intended to
provide ready access from an article to other articles cited in footnotes in
academic and scientific research papers, <A>
is the whole basis
for what we now call "net-surfing." It is the "link," something one
places the mouse pointer over and clicks on in order to call up whatever it points
to.
Obviously, such a massively useful HTML element couldn't possibly be showing any
signs of obsolescence, so of course one could hardly expect that anything attached
to it would not only go the way of such obsolete elements as <PLAINTEXT>
,
<NEXTID>
, <XMP>
, and <LISTING>
,
but do so far more quietly, and with such utter finality. Most of those other
obsolete elements continue to be implemented in most user agents (except
<NEXTID>
), but attributes "URN
" and
"METHODS
" of the <A>
element, listed in HTML
2 (and still present in the working drafts of HTML 3.0) disappeared in HTML 3.2, never
to be seen or heard from again.
This file illustrates the intended use of the URN
and METHODS
attributes of <A>
and <LINK>
as they were intended
to be used as defined in HTML version 2. These attributes were designed to provide more
options for more finesse in using the HREF
attribute or function, i. e. the
linking feature of HTML. These attributes were not actually depreciated, but as no user
agent seems to have utilized them their deletion happened quite painlessly. They are
defined only in HTML 2, but current and acceptible in all four forms of HTML 2. The four
basic varieties of HTML 2 which can be validated are:
<H*>
element)
within a link (<A>
element), or having a forms <INPUT>
element which is not within a block level element such as <P>
<FORM>
, <INPUT>
,
<TEXTAREA>
, <SELECT>
, and
<OPTION>
<H*>
element) within a link (<A>
element)URN
stands for Uniform Resource Name, at least as it finally evolved
before being abandoned. There were some internal discussions early on in which the
"N" stood for Number instead of Name. This was meant to be some
more permanent locating device than the more conventional URL
(Uniform
Resource Locator). Another expression commonly seen in the early web standards
documentation is URI
which stands for Uniform Resource Identifier. What
is the difference between these? The URI is the most global form of these expressions
since it refers to any sort of any one of these short character strings which provide
information needed by the browser and server to track down the requested web resource.
Originally, it was intended that URI's would come in at least two, maybe three or more
forms, namely the URL (which we have today), the URN (which we are discussing here),
and at least one other form, the URC, or Uniform Resource Characteristic, which seems
to have received at least some consideration, though not very much. So, a URL, or a
URN or URC (were any of the latter types to come into existence) would be among the
different types of URI available.
It was hoped that some sort of stable master archive of scientific and scholarly research articles would be established, or if not, then at least there would be a system that such articles, to be kept always somewhere on the web, would always be available using the same URN. Recall, too, that at the time of formulating HTML 2 the whole nature of mirror sites was still as yet not readily settled. Perhaps if some article is not available from this server it might still be available from that one. But a URL simply goes to a specific place where it was found, and if for any reason it isn't there anymore you get a dead link. Another concern was that if it was available in multiple places, it would also be smart enough to find the closest place in which the needed article resided, or perhaps was even cached, and deliver that copy instead of a further away copy, so as to save on network bandwidth. For saving bandwidth was also a consideration, and if a document's URN value were one you already had on file, the network would be spared having to transfer it to you again (a crude sort of caching), since the document would have one single URN value associated with it no matter where it is hosted. Some mid-1993 discussions seem to suggest this use.
Since an URN was a rather specific kind of service, it had a much more specific
form of appearence, while URL's came (at least back then) in so many different forms
including http:, ftp:, mailto:, gopher:, wais:, telnet:, news:, and so many others.
An URN on the other hand had one start, namely urn: (lower and upper case letters
were considered equivalent for the "urn" portion). Some early suggestions
included treating it like a book's ISBN number, hence the idea of N meaning Number
instead of Name, or else like an internet server address (remember those funny
numbers with four parts to them separated by dots that look like 207.217.96.28?), or
else eventually a name (of sorts). Typical URN's might have looked like this:
"urn:ietf:params:language:en-us
" or
"URN:foo:a123,456
." The first part of course is the letters
URN (any combination of upper and lower case), followed by the : and the next
portion ("ietf" or "foo" in the above examples) would be called the
Namespace Identifier (NID) which might refer to a particular URN server or service,
again followed by another : and then follows the Namespace Specific String (NSS)
which would provide whatever information was needed by the NID-named URN service
to know which document to look for. For example, the
"params:language:en-us
" portion of the ietf URN would provide
the ietf server with a parameter "params:
" with a name and
value pair of "language:en-us
" to specify the English language
version of their site.
At such an early point it may not have been clear whether a user agent would somehow have to handle an URN differently from how it would handle the more conventional and common URL, but one other advantage to having a separate attribute would be to serve as a backup. Indeed, there is no reason that a user agent could not have simply looked at the opening four characters of its URI to determine if it were a URN or not, and proceed from there, but furnishing a backup link was a great idea unfortunately not picked up on by any user agent designer. I think a real potential was overlooked here.
The eventual goal would have been that the writer of an article or web resource
provides the standard URL to the article in the HREF
attribute, and if
his article or resource is important enough to be worth permanent archival using
the URN service, he would also place the URN in the URN
attribute.
That way, if someone reading his article and wishing to follow one of the footnoted
leads to some other article or resource is unable to find the article at the place
specified by the URL, the user agent could simply try the URN next so as to produce
the article.
A typical use of the URN would have looked like thus:
<P>See the <A HREF="http://www.ietf.org" URN="urn:ietf:params:language:en-us">IETF Site</A> for details.</P>
And this results in:
See the IETF Site for details.
For further examples and explanaions of the URN attribute, see here and here.
So, in the event that http://www.ietf.org
is unable to be located
(due to server down, etc.), the URN, urn:ietf:params:language:en-us
would
be used instead. The user agent, getting a 400-something or 500-something error code
from the HTTP (e. g. Error 404 File Not Found), would automatically go back again using
the URN and presumably this time find it.
Given how simple it is to determine if a URI is an URN or not (just look at the
first four characters for some variation of "urn:
" or
"URN:
" or "uRn:
" etc.), if there were anything
different a user agent must do in handling an URN, it could be done automatically,
allowing both HREF
and URN
to function as generic URI address
locations, prime and backup. So even had the URN service never come on line (as in fact
it never did), the URN
attribute could still have been useful as a backup
link, even if it only had a conventional URL in it (which the user agent would readily
detect). A scholarly article could point to some footnoted reference with its last
known URL in the HREF
field, and a pointer to a local site page pointed to
by the URN
field, which might feature an abstract of the article, a list
of mirror locations it might also be at, or even a search phrase that a search engine
would reasonably find only in the desired article, so as to provide assistance in finding
the desired article or resource in the event the URL is no longer valid.
Even in this more primitive sense, I know of no user agent which ever employed the
URN
attribute despite its evident potential usefulness in this secondary
manner. In case any user agent out there just might implement it, you can test it with
the following two links:
<A URN="urn:ietf:params:language:en-us">Valid urn: example</A>
Or try its use as a redundant link, thus:
<A URN="http://www.ietf.org">backup url: example</A>
Or try one with an invalid HREF
but a valid URL in the URN
field:
<A HREF="http://www.the-pope.com/invalid.html" URN="urn:ietf:params:language:en-us">backup url: example 2</A>
This should at least make it look like a link, but unless your user agent can detect
the URN
attribute (and what is in it) you will get an "HTTP 404"
("Not Found") error. What a missed opportunity!
HTTP (the Hyper Text Transfer Protocol) includes a number of "methods" by which a page
or resource (server, or file or file space on a server, for example) can be accessed.
RFC 2616 lists some eight methods of access. Some, such as GET and POST, are well known
to those who have worked with HTML forms. However, in addition to those, there are HEAD,
OPTIONS, PUT, DELETE, TRACE, and CONNECT. When applied to a <A>
or
<LINK>
element, the proper default is "GET." This causes the
user agent to transmit the URI (contents of the HREF string), and the server which has the
requested file or program returns the file or program output along with a header consisting
of several lines specifying its length, character set, whether text, html, graphics, or
whatever, and so forth.
Had the METHODS attrribute been implemented in any user agent, one would be able to
request something different. For example, suppose one only wanted to verify that a file
is present, but not to download its contents. Then a METHODS of "HEAD" would tell
the remote resource to return only the header information but not the contents of the file
itself. POST, by its very nature, can only go with some way of entering data to send along
with the URI data request, e. g. form entries. PUT similarly would not apply to <A>
or <LINK>
but gets used by FTP programs for uploading files onto a server.
DELETE is similarly used by FTP to delete files no longer wanted on the server. Ovbiously,
these methods also have no possible application to <A>
or<LINK>
.
I suppose that TRACE and OPTIONS just might have been possible from <A>
and
<LINK>
, but as these are also not implemented anywhere we won't be seeing
any way to make a link get echoed from the remote server (TRACE) nor learn which options are
available (OPTIONS).
HTML 2 allows for links to feature more than one method, hence the name of the attribute being "METHODS" instead of "METHOD" as it is in form elements. The methods (if more than one) are supposed to be separated by white space within the attribute field, so for example if a GET on the file is not possible, then perhaps a HEAD will do, and if failing that, an OPTIONS list returned to see just what exactly is possible:
<A HREF="http://www.the-pope.com/hpn.html" METHODS="GET HEAD OPTIONS">METHODS example</A>
This demonstration file also includes in its <HEAD>
section the following
four <LINK>
examples:
<LINK REL=prev HREF="hpn.html" METHODS="GET OPTIONS TRACE"> <LINK REL=next HREF="nextid.html" METHODS="GET OPTIONS TRACE"> <LINK REL=appendix HREF="invalid.html" URN="urn:foo:a123,456" METHODS="DELETE GET HEAD"> <LINK REL=appendix HREF="http://www.ietf.org" URN="urn:ietf:params:language:en-us" METHODS="GET HEAD">
The URN really is a great idea, and can only hope that work on it may one day resume,
now that the nature of the web is more fully formed and the value and use for such a thing
can at last be appreciated. However, since an URN is easily distinguished by its prefix,
there is no need for a special link for it; it can be recognized as such whether it comes
in the HREF
or URN
attribute. Since an URN would have a more
indirect and complicated path to take, the direct URL would be the first preferred sort of
URI to place in the HREF
attribute, and a secondary pointer, be it an URN or
merely a different URL, could be put in the URN
attribute. So a user agent
should attempt to fetch the HREF
destination, and if failing, should then
simply fetch the URN
destination. Web masters would be advised to point the
URN
to either a real URN (if any ever comes to exist), some more remote mirror
site, or even a special page on their own site containing the data (if they have permission
and the space to host it), some summary or abstract of the data, or at least some search
string distinctive to the article pointed to, and which one could feed to a search engine
to see if it exists anywhere else on the web. Perhaps such search strings could serve in
a manner similar to the intention of the URC.
METHODS
really should be used, and the result displayed, providing that
they are valid methods for a link, namely GET, HEAD, TRACE, or OPTIONS. GET would be the
default and would simply work as links do presently, but if one enters HEAD, TRACE, or
OPTIONS, one gets instead a simple text display of whatever response the user agent gets
from the server. When more than one appears (with either spaces or commas as delimiters),
the first is attempted, and if not successful the second, and so forth until one succeeds.
For example, "GET TRACE" would first attempt to fetch the file like a normal link,
but failing that would then attempt a TRACE and then display the results of that. One
thing that would be particularly useful would be to specify HEAD as either a quick and
hasty link check (a webmaster might simply want a quick way to see if his links are still
valid without having to download their whole file text and graphics etc.), or as a way to
view the HTTP header information on a file. TRACE and OPTIONS are of less use, but these
should also be passed to the web in the link request instead of the usual GET. A
<LINK METHODS="HEAD">
command could pass the return status to
a scripting language, for example, for the viewer to know in advance if a particular link
might be unavailable at the time without having to click it and wait for the user agent
to time out and let him go back.
Possible downgrades are:
Possible upgrades are:
This file, "aatrib.html," is HTML 2.0 Strict Level 1 compliant.
Next Level Up