The <ISINDEX>
tag can be both easy and yet difficult to
understand. References often speak of it referring to a file as being an
"indexed" file, whatever that means. Working examples tend to be somewhat
scarce, even on HTML documentation pages because a simple HTML file with this tag
is not enough. With that one can see what sort of widget the
<ISINDEX>
tag causes a browser to show, but it does not
illustrate the dynamic working of this tag. On the other hand, what can make
this tag easy to understand is the fact that for a replacement, most guides will
recommend HTML forms, which indeed is the successor to this tag.
Forms seem to have made their first appearence somewhere in 1993, with HTML+
describing a crude version of forms (with some variations from how forms ended up
in future versions of HTML), and finally being added as an option for the second
release of HTML 1 ("HTML 1.m") and HTML 2 "level 2." However,
<ISINDEX>
shows all signs of being much older. In an email
from Tim Berners-Lee dated October 29, 1991, <ISINDEX>
was
mentioned in passing and described as being something very much already in
existence and apparently a going concern. In those early days, it was thought that
the server would insert this tag if it somehow recognized the file as being
"searchable." The exact mechanisms for that as used back then are not
clear, but in effect that is how it works as it can be (and is in some very few
instances) implemented today.
Though <ISINDEX>
is almost as old as <NEXTID>
,
<PLAINTEXT>
, <XMP>
, and
<LISTING>
, unlike them it continues to be recognized by even the
most advanced version of HTML and even the initial version of XHTML. But with the
coming of XHTML 1.1 this tag is no longer recognized. The versions of HTML that
concern the <ISINDEX>
tag are:
<ISINDEX>
as a binary switch,
to create a widget window if present and do nothing if absent.<ISINDEX>
is quite specifically meant to be confined to the
<HEADER>
or <HEAD>
portion of the document.<ISINDEX>
is the same as it had been and would be in
HTML 2 Level 1, either absent or present in the <HEAD>
.<ISINDEX>
can also be in the <BODY>
any number of times as well as in the <HEAD>
once, except if forms
are disabled, in which case it would be handled just like in HTML 2 Level 1.<FORM>
, <INPUT>
,
<TEXTAREA>
, <SELECT>
, and
<OPTION>
, and <ISINDEX>
is permitted only in
the <HEAD>
portion of the document.<H*>
element) within a link (<A>
element).
<ISINDEX>
is handled the same as in the regular Level 1.<ISINDEX>
can
here occur in either the <HEAD>
or the <BODY>
.<H*>
element)
within a link (<A>
element), or having a forms <INPUT>
element which is not within a block level element such as <P>
.
<ISINDEX>
is handled the same as in the regular Level 2.<FONT>
, <MAP>
, <APPLET>
,
and <TABLE>
. <ISINDEX>
continues to be
acceptible in both <HEAD>
and <BODY>
as it is
in HTML 2 Level 2.<ISINDEX>
, are included only as
"deprecated" tags.<ISINDEX>
, are excluded.<isindex />
.<isindex />
.<ISINDEX>
has vanished altogether,
never to be heard from again.Of the really old tags, the one thing most unusual about <ISINDEX>
is its need for some programming of some sort at the server end. The tag in the file
displayed by the browser is only part of the picture. For these "Working examples"
to be truly working, it is necessary for some programs to be running in support of it
on the server end of the connection. In this case, not only do I have such a program
running, but I also present here the full text of the program (a Perl script) which is
actually ready to run in support of this. It is however beyond the scope of this
demonstration to implement a full search engine. Since I am concerned with the
<ISINDEX>
tag itself only, it is enough to demonstrate the ability for
the user to enter something and see it echoed as the "search response."
Furthermore, though I present the Perl script in full that is executing, there will be
here no attempt to explain the Perl programming language. There are plenty enough
standard references out there for that. However, as a piece of quality control, my
executable files used for this can be copied directly off the screen as presented here,
which I also did, so what you see here is quite literally and exactly what is running on
the server, or included in this file as a Server Side Include command.
For an <ISINDEX>
example is one place where Server Side Includes can
be of interest. While there must be a program to run on the server side, one can still
make an ordinary HTML file with <ISINDEX>
in it, and then relegate the
program processing to a supplementary file to perform the processing. Such files, when
called from a Server Side Include file, inherit the environmental variables (including
Query string, as seen in the example of this file) of the calling file. This file here is
marked as a Server Side Include file to the server by its extension .shtml
and the executable Perl script file it uses, named isinsup.pl, is contained
beneath the current directory in a subdirectory named cgi. The executable
code contained in isinsup.pl reads as follows:
#!/usr/local/bin/perl print "Content-type: text/html\r\n\r\n"; $sourc = $ENV{'QUERY_STRING'}; $sourc =~ tr/+/ /; $sourc =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; if ($sourc eq '') { print "Nothing has been entered yet."; } else { $sourc =~ s/&/&/g; $sourc =~ s/</</g; $sourc =~ s/>/>/g; $sourc =~ s/"/"/g; $sourc =~ s/ / /g; print "The user has entered \"\;$sourc.\"\;"; }
If you want to use this, the first line is the one thing that might have to be modified, give or take where the Perl script interpreter is located on your server. Other than that, it is a straight cut and paste (and enable the script file for execution, also ensuring that your server is enabled for CGI script execution, Server Side Includes, and specifically the exec command), and you can try this at home. Notice how the $sourc = $ENV{'QUERY_STRING'}; line extracts the query string fed to the calling .shtml file. The next two lines pertain to url encoding (see more of that below) so as to reconstruct in the program the exact phrase the user typed in. Normally, that should be enough for a search engine, but here in this example where we are simply echoing it back, a few additional steps have been added to ensure that the HTML "trigger characters" (e. g. & and <) are harmlessly rendered with HTML entities and as such readible in the browser. The program is invoked from this .shtml file with the following command:
<P><!--#exec cgi="cgi/isinsup.pl" --></P>
and the result is (try going to the <ISINDEX>
widget window
(usually located at the top of this file) and entering something):
Most of the example files called from this file call the same routine using the same
Server Side Include command as utilized above. The more conventional way an
<ISINDEX>
tag file is handled is by being entirely generated by a
script. The following simple script file, as seen here, generates an HTML file:
#!/usr/local/bin/perl print "Content-type: text/html\r\n\r\n"; print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n"; print "<HTML VERSION=\"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n"; print "<HEAD>\r\n"; print "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html\; charset=utf-8\">\r\n"; print "<TITLE>ISINDEX Example</TITLE>\r\n"; $sourc = $ENV{'QUERY_STRING'}; $sourc =~ tr/+/ /; $sourc =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; if ($sourc eq '') { print "<ISINDEX>\r\n"; print "</HEAD>\r\n"; print "<BODY>\r\n"; print "<P>ISINDEX Example</P>\r\n"; print "</BODY>\r\n"; print "</HTML>\r\n"; } else { $sourc =~ s/&/&/g; $sourc =~ s/</</g; $sourc =~ s/>/>/g; $sourc =~ s/"/"/g; $sourc =~ s/ / /g; print "</HEAD>\r\n"; print "<BODY>\r\n"; print "<P>Searched for is \"\;$sourc\"\;</P>\r\n"; print "<P>Click on Back button to return to menu.</P>\r\n"; print "</BODY>\r\n"; print "</HTML>\r\n"; }
This script can be executed from here. This one may be easier to try at home since it does not need Server Side Includes (nor its exec command, specifically) to be enabled. When this program is run, it generates an HTML file that looks like this when nothing has been entered:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN"> <HTML VERSION="-//IETF//DTD HTML 2.0 Strict Level 1//EN"> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> <TITLE>ISINDEX Example</TITLE> <ISINDEX> </HEAD> <BODY> <P>ISINDEX Example</P> </BODY> </HTML>
Alternatively, is something has been entered, it looks like this:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN"> <HTML VERSION="-//IETF//DTD HTML 2.0 Strict Level 1//EN"> <HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> <TITLE>ISINDEX Example</TITLE> </HEAD> <BODY> <P>Searched for is "abc"</P> <P>Click on Back button to return to menu.</P> </BODY> </HTML>
In the above, one sees what the generated file looks like if the user entered "abc," for example.
One thing <ISINDEX>
is especially useful for is demonstrating how
url encoding works. URL encoding means taking what the user entered and converting it
to a different string which can be used as a url. It cannot have any spaces, and many
special characters must be converted since they have special meanings in the context of
a url, and must therefore be converted. One thing to see is how spaces are converted
into plus (+) signs (making the plus sign itself one of the special characters
that must be converted). All punctuation except @, *,
-, _, and . must be converted, along with any
non-ASCII characters. So they each get changed into the format of a %
followed by two hexidecimal digits (0-9A-F) so, for example, the plus sign
itself, if entered, shows up in the url as a %2B. That is why the Perl
script (or whatever program one uses) must first convert actual plusses into spaces
and %nn sequences into single byte characters. The best part is that by
setting the charset of the file to "utf-8" (which is acceptible in HTML 2 since
it is done as a mere value fed to an attribute of the <META>
tag and
there are no restrictions on its attribute values) it becomes able to accept and display
(if your browser is enabled to display) such horrible things as
"أَبْجَدْ" (Arabic for how
one begins to recite the Arabic ABC's) can be entered and will display where echoed, since
the multible bytes of utf-8 simply become multible %nn sequences in the url.
You can copy and paste from the screen here to the <ISINDEX>
widget
window and see for yourself what it does. If your browser cannot handle Arabic, try this
Vietnamese name instead: "Ngô Đình Thục."
Though more than one occurrance of <ISINDEX>
in the head of the
document is strictly speaking an error, its presence there, either one or more times,
should result in a single widget in the browser display frame, similar to how it shows
on Mosaic, though it may be at the top or the bottom, reversed in direction for right-to-left,
or even put on a side for vertical languages such as Chinese. Putting <ISINDEX>
in the head shows that one does not want it in the main flow of the scrolling text but
always present (not moving ) on the screen and independant of the scrolling text. Relatively
few styles would apply to it, though one could implement color, typeface, and window size
style commands, using those styles specified in the last instance within the head of the
document (if erroneously more than one occurs in the head of the document).
If occurring in the body of the document, <ISINDEX>
should insert
a widget window at the point in the flow of the text at which it occurs, as many times
as it is found in the body of the document, and with the style commands implemented as
specifically applied to the particular instance. It should by default be implemented
consistent with most modern browsers which have an <HR>
line above and
below the prompt and widget window. If a text direction DIR
is right to left,
the prompt string and text window should start from the right hand side, with the prompt
string itself to the right of the widget window. The <ISINDEX>
widget
window should be able to respond to the pressing of the Enter or Return key on the terminal
keyboard so no "SUBMIT
" button would be needed. Finally, any
<FORM>
widget of type TEXT
where the NAME
is
"ISINDEX
" (case insensitive) and not connected to a SUBMIT
button, should not have an "ISINDEX=...
" put in front of the entered
text (much as Microsoft Explorer and the older (pre Version 5) Netscape handles it).
Style sheet commands should apply to the whole widget, where applicable by nature.
Though my examples here have been tailored to the Microsoft model of applying all style
commands to the widget window itself alone (and I am personally more comfortable with
that), there is a good case to be made for applying it to everything else as well, for
example replacing the surrounding <HR>
lines with whatever border
commands are given, and applying text styles to both the prompt string and the text
entered characters in the widget window itself. When used with vertical languages (such
as Chinese) the whole <ISINDEX>
widget should be aligned vertically
as well.
I have created a small cluster of demonstration files to show the various upgrades and
possible replacements for the <ISINDEX>
element contained in this file.
Possible downgrades are:
<HEAD>
and the <BODY>
that would come to be
formalized over the course of 1992, and therefore could be placed anywhere in the file,
originally (though in practice, <HEAD>
type elements generally tended
to be put first in the file as a matter of custom). As originally envisioned,
<ISINDEX>
functioned as a boolean selector - if present, provide the
widget window, if absent, no widget window. Even the widget window itself might be located
outside the scrolling window area, and as it was purely a binary choice of providing it or
not, there was no use (and no validity) for more than one in a file. Such a selector
flag, functioning totally outside the flow of information on the page, was exclusively
confined to the <HEAD>
section, and so continued clear into HTML 2
Level 1. As this file itself and the demonstration test file above are already in pure
HTML 2 Level 1 (Strict, but it would be exactly the same if non-strict), there are no
further downgrades for <ISINDEX>
beyond as demonstrated here.Possible upgrades are:
<ISINDEX>
as widget - In HTML 2
Level 2, not only are Forms introduced, but <ISINDEX>
itself functions
differently. Where in Level 1 it is only permitted in the <HEAD>
area,
now it becomes permissible in any location, and furthermore, may appear multiple times. It
is still, however, restricted to appearing only once in the <HEAD>
,
though it may additionally appear any number of times in the <BODY>
as
well.<FORM>
and <INPUT>
- HTML
2 Level 2 far more importantly introduces the Forms entry suite of tags which provides the
user with many different means of entering data to be processed by the server. Naturally,
one of those Forms tags would approximate the functionality of <ISINDEX>
reasonably closely, though additional items are needed to make it look the same.<ISINDEX>
"PROMPT
"
attribute - HTML 3.2 added a number of attributes to many tags, including a
PROMPT
attribute for the <ISINDEX>
element. This allows the
programmer to select some different phrase than "You can search this index. Type the
keyword(s) you want to search for:," or whatever your browser generates. This file
illustrates the flexibility gained using this feature, at least in display.<ISINDEX>
Stylesheet attributes - HTML 4.01
adds yet many more features to the HTML language, including stylesheet attributes ID
STYLE
and CLASS
. In addition, it also adds a TITLE
attribute and LANG
and DIR
attributes. It does not add scripting
language access to the <ISINDEX>
widget window.<isindex />
in XHTML - XHTML 1.0 Transitional
duplicates the same functionality as contained in HTML 4.01 Transitional, but this represents
the furthest modernization of <isindex />
it would take before being
eliminated altogether.<ISINDEX>
HEAD Widget Emulation - Forms and
frames in HTML 4.01 are used here to simulate what <ISINDEX>
would look
like when properly implemented as a <HEAD>
element.<ISINDEX>
,
the most useful and interesting (and only one I demonstrate here) is an ACTION
attribute introduced by Netscape, but copied into
several other browsers. This attribute at last allows one to put a working
<ISINDEX>
tag in a regular HTML file. Of course, a program is still
required to process the user input.This file, "isin.shtml," is HTML 2.0 Strict Level 1
compliant,
even when something is
entered.
The <ISINDEX>
small test demonstration file
"cgi/test6.pl" is HTML 2.0 Strict Level 1
compliant,
even when something is
entered.
The HTML 2 Level 2 multiple <ISINDEX>
tag demonstration
file "isin1.shtml" is HTML 2.0 Strict (Level 2)
compliant,
even when something is
entered.
The HTML 2 Level 2 <FORM>
and <INPUT>
demonstration
file "isin2.shtml" is HTML 2.0 (Level 2)
compliant,
even when something is
entered.
The <ISINDEX>
"PROMPT
" attribute demonstration
file "isin3.shtml" is HTML 3.2
compliant,
even when something is
entered.
The <ISINDEX>
Stylesheet attributes demonstration
file "isin4.shtml" is HTML 4.01 Transitional
compliant,
even when something is
entered.
The <isindex />
XHTML 1.0 demonstration
file "isin5.shtml" is XHTML 1.0 Transitional
compliant,
even when something is
entered.
The Forms and Frames "<ISINDEX>
" Emulation demonstration
file "isin6.html" is HTML 4.01 Frameset
compliant.
The Forms and Frames Upper portion demonstration file
"isinf1.html" is HTML 4.01 Transitional
compliant.
The Forms and Frames Lower portion demonstration file
"isinf2.html" is HTML 4.01 Transitional
compliant.
The Propriatary attribute "ACTION
" demonstration
file "isin7.html" is not any kind of HTML
compliant.