While my first priority in forming this site has been the propegation of the historic and universal Roman Catholic Faith and Church, for me personally building this site has also been a tremendous and wonderful learning experience in using the resources of the web and the detailed capabilities of Hyper Text Transfer Protocol (HTTP), Hyper Text Markup Language (HTML) and Perl/Common Gateway Interface (CGI).
There are a number of features which give this site some of its distinct personality, and perhaps of some mild interest from the standpoint of its technology. As a stylistic choice, I have steadfastly maintained a very simple yet distinctive and deliberately "retro" look throughout the site, all of which should be able to function more or less correctly with even HTML version 2.0, Level 1, and with few if any problems even with HTML version 1.0. This has been done to minimize the download time for all my pages. But do not be fooled. Behind the scenes there are many web-related technologies at work, but merely used quite sparingly so as not to intrude but merely expand specific functionality and user ease of use and access.
One point of clarification before proceeding is that not all "pages" contained or hosted within my site space are truly part of "my" site. There are a number of pages and even entire sites which are not mine at all but the work of others, who for whatever reason, are no longer in a position to maintain a web presence. In these, I have cleaned out any banner commands, disabled any CGI directives (such as counters), modified the link structure to match that used on my site (I put all files in one single directory, other than my Perl/CGI files), disabled any links they contain which are not valid, or updated links to pages which have moved. In a couple cases I have added some Javascript funtionality which I will get to later. There are also some few sites which I am the preparer of, but which I do not regard as part of my site, properly speaking. In particular, one such site is the "In the Spirit of Chartres" Committee site, which requires HTML version 3.2, or at least version 2.0, Level 2, and with extensions commonly seen on primitive browsers, namely sizing attributes on the <IMG> tag. Unless otherwise mentioned, the pages I discuss here in the site technical notes page are to be considered part of my own website.
One feature (pertaining to my site and all the others hosted within my web space) is the complete absence of banner advertisements of any kind. These annoying and frustrating entities are one thing I cannot help feeling that the typical web surfer hates with wild abandon, and even such pages as those coming from others which might have had them while still in their care, have had all such things systematically removed before being hosted on my site. Another, as I have mentioned above, is the minimal size of nearly all pages, keeping graphics and other time consuming entities to a bare minimum for the speeding of downloading. Both of these things are because all of my HTML, and also my Perl/CGI, has been hand entered, using only notepad or wordpad editors, never any commercial or freeware "HTML Editor" or "Web Page Editor." In a few cases, I have used Microsoft Word to type in a lot of text, used it to generate a HTML file, and then manually tweaked the file to fit my format.
HTML in particular has fascinated me, especially
the way it has grown and changed from those heady but early and
primitive days when Tim Berners-Lee created his original
NeXT Workstation
HTML browser and editor. While some of the original tags such as
<TITLE>
and <LINK>
which now go in
the header portion, and the <P>
, <A>
,
<Hn>
(where "n
" is a digit
"1
" through "6
"),
<DL>
<DT>
, <DD>
,
<UL>
, <OL>
, <LI>
,
and <ADDRESS>
which now go into the body portion, have
all held up quite well and see frequent use even today, others, such as
<MENU>
or <DIR>
are rarely seen,
although still "out there," and still others, such as
<HPn>
(where "n
" is a digit
"0
" through "3
"), the
<NEXTID>
, and <ISINDEX>
tags have
fallen into disuse. Indeed, <NEXTID>
was already
considered to be going obsolete when Tim Berners-Lee drafted his original HTML
description
in 1992, since it was clearly limited to the NeXT browser only, and the
<HPn>
tags were replaced early on with the
<B>
, <U>
, and <I>
tags.
I am drafting a special online-only work, titled The Lost Tags of HTML. This is meant to be a history of HTML, but from the unique perspective of those early tags that did not last long and are no longer encouraged, accepted, or in some cases even implemented. It is a chance to explain what these tags were, what replaces them, and why they are no longer accepted today.
The <ISINDEX>
tag is an example
of the "Lost Tags" of HTML that captures my imagination enough to
implement it on my site, since it goes back to the very beginning of the
HTML design as a way for there to be user entry, long before forms and
applets and scripting languages made their appearence. This tag is
virtually never seen today, and functional instances of it are even
rarer, owing to the difficulty of using it. In its original form, it
cannot meaningfully work from any mere text file but requires the text
to be generated by a CGI program. A simple such
program, written
in Perl, for demonstration purposes, would be:
#!/usr/local/bin/perl print "Content-type: text/html\r\n\r\n"; print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n"; print "<HTML VERSION=\"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n"; print "<HEAD>\r\n"; print "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html\; charset=utf-8\">\r\n"; print "<TITLE>ISINDEX Example</TITLE>\r\n"; $sourc = $ENV{'QUERY_STRING'}; $sourc =~ tr/+/ /; $sourc =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; if ($sourc eq '') { print "<ISINDEX>\r\n"; print "</HEAD>\r\n"; print "<BODY>\r\n"; print "<P>ISINDEX Example</P>\r\n"; print "</BODY>\r\n"; print "</HTML>\r\n"; } else { $sourc =~ s/&/&/g; $sourc =~ s/</</g; $sourc =~ s/>/>/g; $sourc =~ s/"/"/g; $sourc =~ s/ / /g; print "</HEAD>\r\n"; print "<BODY>\r\n"; print "<P>Searched for is \"\;$sourc\"\;</P>\r\n"; print "<P>Click on Back button to return to menu.</P>\r\n"; print "</BODY>\r\n"; print "</HTML>\r\n"; }
My site actually has a functional instance of
this tag with the Topical
Subject Index, quite possibly the only such on practically the entire
web. It is like having a tag all to myself, though of course, anyone could
freely decide to use it, if only they are willing to go through creating an
indexing program using this. It is interesting to see how this tag reacts
on various primitive browsers, even some which understand forms, such as the
last version of Mosaic. Where a form command simply puts its widget windows
on the page space as it does on all more recent browsers, the <ISINDEX>
tag actually places a widget on the outer non-scrolling frame, where one
finds such standard browser buttons as the back, forward, refresh, and
stop buttons. Of course, any earlier browsers which do not understand
forms (but understand only HTML Version 2.0, Level 1 and below) will nevertheless
understand and be able to function with my Topical Subject Index page.
One other thing to note is the line terminator for the generation
of each line of text of HTML in the generator file, namely "\r\n
" for
the line terminator instead of merely "\n
". This is because the server
is a UNIX machine which expects both carriage return and the line feed to terminate
each line of text. All .html files on the server already have both characters at
the end of each line placed there by the software which uploads the pages from the PC
to the server, and the UNIX-based server automatically translates them back to the
single character used on ordinary PCs. If this is not done, the file generated by
the Perl/CGI script, if the source is viewed, will all be on one long line and
difficult to read, with weird looking characters displayed where new lines were
meant to be started.
The one modern "standard" I have largely
ignored in the rest of my site is the new XHTML and XML standards which require
all tag lables and attribute names to be in lower case, where any regular HTML,
even the most recent version 4.01, always uses upper case in all its examples
to draw attention to and clarify the use of the HTML tags. Another thing that
distinguishes XHTML and XML is that all tags are meant to be symmetric to,
and have, closing tags. While I do follow that standard (because it seems
reasonable for consistancy among browsers) such as by using </LI>
to close out a span started with the <LI>
tag, the rather absurd
and reductionist XHTML/XML treatment of unary tags is also not bothered with, e.
g. I don't put <isindex />
for <ISINDEX>
or
<hr />
for <HR>
. I don't worry about compatibility
problems since I anticipate quite some passage of time (if ever) before it would ever
be practical or reasonable to have any XHTML/XML-only browsers, either as freeware or
as commercially offered, owing to the wide use of more primitive HTML practically
throughout the entire net, even as modern browsers continue to support
<XMP>
tags.
Another interesting primitive web functionality which still gets used, but seems clearly dwarfed by more recent technologies, is the Server Side Includes. Any files with these must use the file extension .shtml rather than .html in order to alert the server that it contains Server Side Includes, and that it must implement them, substituting these things which otherwise look like comments with whatever the results of its command instruction is. This Site Technical notes page, and a simple test page, are examples of my first use of Server Side Includes on my site, as these enabled me to experiment and to provide and make visible some few page counts. So for example, the Server Side Include example:
<!--#include file="file_access_count.txt" -->
is used to provide a page count for a page, in this case, the main page, which has been accessed times.
Although quite a number of such Server
Side Include commands have been defined, only the basic few originally
specified appear to have been implemented on my server, namely: #include
,
#config
, #fsize
, #flastmod
, #echo
,
and #exec,
which last is demonstrated at the very bottom of this Site
Technical notes page. So for example, the following information about my External
Links page file, which is
11,640 bytes long and was last modified on
Monday, February 09, 2009 and has been accessed
times as of this date, Friday, November 15, 2024, bases itself on the following
Server Side Include commands:
<!--#config sizefmt="bytes" --> <!--#config timefmt="%A, %B %d, %Y" --> <!--#config errmsg="This instance of Server Side Includes has failed" --> <!--#fsize file="exlinks.html" --> <!--#flastmod file="exlinks.html" --> <!--#include file="file_access_count.txt" --> <!--#echo var="Date_Local" -->
Another "nook & cranny" bit of web
technology is the use of the <META>
tag. Oftentimes this tag is used
for such things as enabling or disabling the use of searching by net robots (such as
Googlebot), providing ratings to the page, supplying keywords and descriptions that may be
helpful for some search engines or the Dublin Core or the like, and finally to bring
out a few other common browser features. One of them (exampled here) is the refresh
function, demonstrated here.
Several of my pages employ a small and usually invisible amount of Javascript in order to make it easy for the reader to escape a frame. I do not use frames anywhere within my site as I don't particularly care for them myself, although I have seen others use them to good effect. However, since another site could (and one does) point to my site, but insert it into its frame, I wanted to provide the reader with the option to break out of the frame in one simple button click, or at least be notified that it is in a frame. So, for example, the Javascript:
<SCRIPT LANGUAGE="JavaScript" TYPE="TEXT/JAVASCRIPT"> <!-- Hide script from old browsers if (top.location != self.location) { document.write('<INPUT TYPE="BUTTON" VALUE="Expand out of frame" onClick="top.location = self.location">') } // End hiding script from old browsers --> </SCRIPT>
enables my main page to escape from a frame with a single mouse click, should they desire it, or they may remain in the frame if that is what they prefer. The one compatibility problem is that if a browser supports frames and Javascript, but not HTML version 4, the words "Expand out of frame" simply appear at the top as a reminder that my page is in a frame and need not be viewed that way.
Another Javascript functionality I have (which I only use on pages of external sites I have rehosted) exists so as to put the reader on notice that the site or page in question contains material to which I take exception. This Javascript posts a message so stating in an "alert" window, and then gives the reader the option of simply making it go away (but it will come back every time the page is called up or refreshed), or else to accept a cookie which will tell my script not to bring up this message again. The cookie expires after one year. Here is the Javascript program to do this, which attempts to read the cookie, and if it does, goes no further, but if not, then brings up the message (message not shown here) and then gives the reader the option to write the cookie or simply remove the message alert window and continue the download:
<SCRIPT LANGUAGE="JavaScript" TYPE="TEXT/JAVASCRIPT"> <!-- Hide script from old browsers var msg = "Message goes here"; if (document.cookie == "") { if (confirm(msg)) { expireDate = new Date; expireDate.setYear(expireDate.getYear()+1) document.cookie = "name=read; expires=" + expireDate.toGMTString(); } } // End hiding script from old browsers --> </SCRIPT>
It is also of some interest to me that the same function could be performed using CGI, at least theoretically. However, experience has shown that the implementation of cookies in CGI have been at best rather spotty and unreliable. The method used in reading cookies using CGI involves the parameters sent by the browser which are available in the environment variables which can be read by any CGI file using the HTTP Protocols. This test program prints out all the environment variables. If there are any cookies set, there would be a parameter named "HTTP_COOKIE" and its contents would be a list of cookies and their contents shown after the "is set to" phrase.
Sending a cookie via CGI is even more interesting, if even less reliable, but before that can be understood, the presense and nature of HTTP headers at the start of all files obtained over the net using the HTTP protocol is something that must be understood. A server decides a file's type based on the file extension, the letters after the last dot in the file's name. That information and some other things and other possible protocol messages, including error messages such as "server not found" or "access denied," all gets sent within what is called the header, a few lines of information which is sent before the file itself. In CGI, one must explicitly tell the server what kind of file it is so all CGI programs have an opening line or lines, followed by two returns in a row to set it apart from the rest of the file. In the Perl/CGI example seen earlier this page, it is the
print "Content-type: text/html\n\n";
line which specifies that the file following is a text file which contains HTML. It is the exact equivalent of sending a file with the .html or .htm extension. There are also other specifications called for in the HTTP protocol, such as text (the equivalent of a .txt file), cascading style sheets (the equivalent of a .css file), or Graphics Interchange Format (the equivalent of a .gif file), as follows
print "Content-type: text/plain\n\n"; print "Content-type: text/css\n\n"; print "Content-type: image/gif\n\n";
It is in this region that a Set-Cookie command could
also go, in order to send a cookie using CGI. Unfortunately, only certain versions
of Netscape browsers seem to understand and proccess this HTTP protocol instruction,
making it useless for most applications. It is also within this location that
something else interesting can be done, and this is thankfully far more universally
applied. This is the "Location:
" command, which redirects the
browser to a specified page. It only takes a single Perl/CGI print command:
print "Location: http://www.the-pope.com/destination_file.html\n\n";
to send the reader to the chosen destination file. It is interesting to see how similar things can be done more than one way. For example, one might have just as well redirected the reader from one .html file to another by using the Javascript program:
<SCRIPT LANGUAGE="JavaScript" TYPE="TEXT/JAVASCRIPT"> <!-- Hide script from old browsers window.location='http://www.the-pope.com/destination_file.html' // End hiding script from old browsers --> </SCRIPT>
but in this case, since some browsers don't implement Javascript (and some people disable their browser's Javascript capability), the Perl/CGI method is clearly the method of preference.
Although I have used images most sparingly on my site, I wanted
to expose myself to the techniques of imagemaps, and so I devised a small devotional
approach which uses the graphic image used on my main page as the graphic basis for
an image map. Once again, there are two basic and distinct ways to implement image
maps, one using "client-side" techniques, and the other using "server-side"
techniques. HTML version 3.2 introduced the <MAP>
and <AREA>
tags which feature in all basic client-side imagemaps. Prior to that, the only way to
do that on the client-side (reader's PC) was with a Java Applet. Unfortunately I have not
as yet succeeded in getting any Java Applet to function correctly on my site, so I
cannot display this fascinating intermediate technology which was the first to
implement client-side imagemaps.
Before that (and going clear back to HTML version 1), all imagemaps
were "server-side" imagemaps. This means that each use of such a map, by
clicking on a specific location, required that the coordinates of the position clicked
on must be sent to the server, which then figures out what part of the image the mouse
click occurred in and from there which "region" it goes in. For this there was
the "ISMAP
" attribute of the <IMG>
tag. The
client-side imagemaps use the "USEMAP
" attribute, which was also
not officially defined until HTML version 3.2. My Resurrection
Picture page combines both technologies so as to function correctly and transparently
with either older browsers which don't support any client-side imagemaps as well as more
modern browsers which do. As test files, I have both a pure server-side
version of the same page, so as to demonstrate to those with more modern browsers what
server-side imagemaps function like, and a pure client-side version
which allows one to see if their browser supports the HTML version 3.2 (or any later)
standard. It is this server-side imagemap function which uses the "Location:
"
function of the CGI/Perl HTTP described above to forward the reader to the file indicated
by the location or area on the image. In the <MAP>
structure of
HTML 3.2 and later, each <AREA>
is simply treated as a link.
Further topics to be added some day are: thegrid.net, CGI scripts no longer available, accessibility aspects, language, and style sheets.
Although nearly all pages on my site have hidden counters on them, I show here the current totals for only some of the main few pages:
By the way, the Server Side Include command:
<!--#exec cgi="cgi/cgi_filename.pl" -->
together with a CGI file shown here
#!/usr/local/bin/perl print "Content-type: text/html\n\n"; open (CNT, "<file_access_count.txt"); flock(CNT, 2); $count = <CNT>; flock(CNT, 8); close (CNT); $count = $count + 1; open (CNT, ">file_access_count.txt"); flock(CNT, 2); print CNT $count; flock(CNT, 8); close (CNT); print $count;
which increments a counter and outputs the resulting total as text/html, shows that this .shtml file has been referenced times.