Site Technical notes

While my first priority in forming this site has been the propegation of the historic and universal Roman Catholic Faith and Church, for me personally building this site has also been a tremendous and wonderful learning experience in using the resources of the web and the detailed capabilities of Hyper Text Transfer Protocol (HTTP), Hyper Text Markup Language (HTML) and Perl/Common Gateway Interface (CGI).

There are a number of features which give this site some of its distinct personality, and perhaps of some mild interest from the standpoint of its technology. As a stylistic choice, I have steadfastly maintained a very simple yet distinctive and deliberately "retro" look throughout the site, all of which should be able to function more or less correctly with even HTML version 2.0, Level 1, and with few if any problems even with HTML version 1.0. This has been done to minimize the download time for all my pages. But do not be fooled. Behind the scenes there are many web-related technologies at work, but merely used quite sparingly so as not to intrude but merely expand specific functionality and user ease of use and access.

One point of clarification before proceeding is that not all "pages" contained or hosted within my site space are truly part of "my" site. There are a number of pages and even entire sites which are not mine at all but the work of others, who for whatever reason, are no longer in a position to maintain a web presence. In these, I have cleaned out any banner commands, disabled any CGI directives (such as counters), modified the link structure to match that used on my site (I put all files in one single directory, other than my Perl/CGI files), disabled any links they contain which are not valid, or updated links to pages which have moved. In a couple cases I have added some Javascript funtionality which I will get to later. There are also some few sites which I am the preparer of, but which I do not regard as part of my site, properly speaking. In particular, one such site is the "In the Spirit of Chartres" Committee site, which requires HTML version 3.2, or at least version 2.0, Level 2, and with extensions commonly seen on primitive browsers, namely sizing attributes on the <IMG> tag. Unless otherwise mentioned, the pages I discuss here in the site technical notes page are to be considered part of my own website.

One feature (pertaining to my site and all the others hosted within my web space) is the complete absence of banner advertisements of any kind. These annoying and frustrating entities are one thing I cannot help feeling that the typical web surfer hates with wild abandon, and even such pages as those coming from others which might have had them while still in their care, have had all such things systematically removed before being hosted on my site. Another, as I have mentioned above, is the minimal size of nearly all pages, keeping graphics and other time consuming entities to a bare minimum for the speeding of downloading. Both of these things are because all of my HTML, and also my Perl/CGI, has been hand entered, using only notepad or wordpad editors, never any commercial or freeware "HTML Editor" or "Web Page Editor." In a few cases, I have used Microsoft Word to type in a lot of text, used it to generate a HTML file, and then manually tweaked the file to fit my format.

Here Lies <TAG> R. I. P.

HTML in particular has fascinated me, especially the way it has grown and changed from those heady but early and primitive days when Tim Berners-Lee created his original NeXT Workstation HTML browser and editor. While some of the original tags such as <TITLE> and <LINK> which now go in the header portion, and the <P>, <A>, <Hn> (where "n" is a digit "1" through "6"), <DL> <DT>, <DD>, <UL>, <OL>, <LI>, and <ADDRESS> which now go into the body portion, have all held up quite well and see frequent use even today, others, such as <MENU> or <DIR> are rarely seen, although still "out there," and still others, such as <HPn> (where "n" is a digit "0" through "3"), the <NEXTID>, and <ISINDEX> tags have fallen into disuse. Indeed, <NEXTID> was already considered to be going obsolete when Tim Berners-Lee drafted his original HTML description in 1992, since it was clearly limited to the NeXT browser only, and the <HPn> tags were replaced early on with the <B>, <U>, and <I> tags.

I am drafting a special online-only work, titled The Lost Tags of HTML. This is meant to be a history of HTML, but from the unique perspective of those early tags that did not last long and are no longer encouraged, accepted, or in some cases even implemented. It is a chance to explain what these tags were, what replaces them, and why they are no longer accepted today.

The <ISINDEX> tag is an example of the "Lost Tags" of HTML that captures my imagination enough to implement it on my site, since it goes back to the very beginning of the HTML design as a way for there to be user entry, long before forms and applets and scripting languages made their appearence. This tag is virtually never seen today, and functional instances of it are even rarer, owing to the difficulty of using it. In its original form, it cannot meaningfully work from any mere text file but requires the text to be generated by a CGI program. A simple such program, written in Perl, for demonstration purposes, would be:

#!/usr/local/bin/perl
print "Content-type: text/html\r\n\r\n";
print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n";
print "<HTML VERSION=\"-//IETF//DTD HTML 2.0 Strict Level 1//EN\">\r\n";
print "<HEAD>\r\n";
print "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html\; charset=utf-8\">\r\n";
print "<TITLE>ISINDEX Example</TITLE>\r\n";
$sourc = $ENV{'QUERY_STRING'};
$sourc =~ tr/+/ /;
$sourc =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($sourc eq '')
{
  print "<ISINDEX>\r\n";
  print "</HEAD>\r\n";
  print "<BODY>\r\n";
  print "<P>ISINDEX Example</P>\r\n";
  print "</BODY>\r\n";
  print "</HTML>\r\n";
}
else
{
  $sourc =~ s/&/&amp;/g;
  $sourc =~ s/</&lt;/g;
  $sourc =~ s/>/&gt;/g;
  $sourc =~ s/"/&#34;/g;
  $sourc =~ s/ /&nbsp;/g;
  print "</HEAD>\r\n";
  print "<BODY>\r\n";
  print "<P>Searched for is \&#34\;$sourc\&#34\;</P>\r\n";
  print "<P>Click on Back button to return to menu.</P>\r\n";
  print "</BODY>\r\n";
  print "</HTML>\r\n";
}

My site actually has a functional instance of this tag with the Topical Subject Index, quite possibly the only such on practically the entire web. It is like having a tag all to myself, though of course, anyone could freely decide to use it, if only they are willing to go through creating an indexing program using this. It is interesting to see how this tag reacts on various primitive browsers, even some which understand forms, such as the last version of Mosaic. Where a form command simply puts its widget windows on the page space as it does on all more recent browsers, the <ISINDEX> tag actually places a widget on the outer non-scrolling frame, where one finds such standard browser buttons as the back, forward, refresh, and stop buttons. Of course, any earlier browsers which do not understand forms (but understand only HTML Version 2.0, Level 1 and below) will nevertheless understand and be able to function with my Topical Subject Index page.

One other thing to note is the line terminator for the generation of each line of text of HTML in the generator file, namely "\r\n" for the line terminator instead of merely "\n". This is because the server is a UNIX machine which expects both carriage return and the line feed to terminate each line of text. All .html files on the server already have both characters at the end of each line placed there by the software which uploads the pages from the PC to the server, and the UNIX-based server automatically translates them back to the single character used on ordinary PCs. If this is not done, the file generated by the Perl/CGI script, if the source is viewed, will all be on one long line and difficult to read, with weird looking characters displayed where new lines were meant to be started.

The one modern "standard" I have largely ignored in the rest of my site is the new XHTML and XML standards which require all tag lables and attribute names to be in lower case, where any regular HTML, even the most recent version 4.01, always uses upper case in all its examples to draw attention to and clarify the use of the HTML tags. Another thing that distinguishes XHTML and XML is that all tags are meant to be symmetric to, and have, closing tags. While I do follow that standard (because it seems reasonable for consistancy among browsers) such as by using </LI> to close out a span started with the <LI> tag, the rather absurd and reductionist XHTML/XML treatment of unary tags is also not bothered with, e. g. I don't put <isindex /> for <ISINDEX> or <hr /> for <HR>. I don't worry about compatibility problems since I anticipate quite some passage of time (if ever) before it would ever be practical or reasonable to have any XHTML/XML-only browsers, either as freeware or as commercially offered, owing to the wide use of more primitive HTML practically throughout the entire net, even as modern browsers continue to support <XMP> tags.

Another interesting primitive web functionality which still gets used, but seems clearly dwarfed by more recent technologies, is the Server Side Includes. Any files with these must use the file extension .shtml rather than .html in order to alert the server that it contains Server Side Includes, and that it must implement them, substituting these things which otherwise look like comments with whatever the results of its command instruction is. This Site Technical notes page, and a simple test page, are examples of my first use of Server Side Includes on my site, as these enabled me to experiment and to provide and make visible some few page counts. So for example, the Server Side Include example:

<!--#include file="file_access_count.txt" -->

is used to provide a page count for a page, in this case, the main page, which has been accessed times.

Although quite a number of such Server Side Include commands have been defined, only the basic few originally specified appear to have been implemented on my server, namely: #include, #config, #fsize, #flastmod, #echo, and #exec, which last is demonstrated at the very bottom of this Site Technical notes page. So for example, the following information about my External Links page file, which is 11,640 bytes long and was last modified on Monday, February 09, 2009 and has been accessed times as of this date, Friday, November 15, 2024, bases itself on the following Server Side Include commands:

<!--#config sizefmt="bytes" -->
<!--#config timefmt="%A, %B %d, %Y" -->
<!--#config errmsg="This instance of Server Side Includes has failed" -->
<!--#fsize file="exlinks.html" -->
<!--#flastmod file="exlinks.html" -->
<!--#include file="file_access_count.txt" -->
<!--#echo var="Date_Local" -->

Another "nook & cranny" bit of web technology is the use of the <META> tag. Oftentimes this tag is used for such things as enabling or disabling the use of searching by net robots (such as Googlebot), providing ratings to the page, supplying keywords and descriptions that may be helpful for some search engines or the Dublin Core or the like, and finally to bring out a few other common browser features. One of them (exampled here) is the refresh function, demonstrated here.

Several of my pages employ a small and usually invisible amount of Javascript in order to make it easy for the reader to escape a frame. I do not use frames anywhere within my site as I don't particularly care for them myself, although I have seen others use them to good effect. However, since another site could (and one does) point to my site, but insert it into its frame, I wanted to provide the reader with the option to break out of the frame in one simple button click, or at least be notified that it is in a frame. So, for example, the Javascript:

<SCRIPT LANGUAGE="JavaScript" TYPE="TEXT/JAVASCRIPT">
<!-- Hide script from old browsers
if (top.location != self.location)
{
  document.write('<INPUT TYPE="BUTTON" VALUE="Expand out of frame"
   onClick="top.location = self.location">')
}
// End hiding script from old browsers -->
</SCRIPT>

enables my main page to escape from a frame with a single mouse click, should they desire it, or they may remain in the frame if that is what they prefer. The one compatibility problem is that if a browser supports frames and Javascript, but not HTML version 4, the words "Expand out of frame" simply appear at the top as a reminder that my page is in a frame and need not be viewed that way.

Another Javascript functionality I have (which I only use on pages of external sites I have rehosted) exists so as to put the reader on notice that the site or page in question contains material to which I take exception. This Javascript posts a message so stating in an "alert" window, and then gives the reader the option of simply making it go away (but it will come back every time the page is called up or refreshed), or else to accept a cookie which will tell my script not to bring up this message again. The cookie expires after one year. Here is the Javascript program to do this, which attempts to read the cookie, and if it does, goes no further, but if not, then brings up the message (message not shown here) and then gives the reader the option to write the cookie or simply remove the message alert window and continue the download:

<SCRIPT LANGUAGE="JavaScript" TYPE="TEXT/JAVASCRIPT">
<!-- Hide script from old browsers
var msg = "Message goes here";
if (document.cookie == "")
{
  if (confirm(msg))
  {
    expireDate = new Date;
    expireDate.setYear(expireDate.getYear()+1)
    document.cookie = "name=read; expires=" + expireDate.toGMTString();
  }
}
// End hiding script from old browsers -->
</SCRIPT>

It is also of some interest to me that the same function could be performed using CGI, at least theoretically. However, experience has shown that the implementation of cookies in CGI have been at best rather spotty and unreliable. The method used in reading cookies using CGI involves the parameters sent by the browser which are available in the environment variables which can be read by any CGI file using the HTTP Protocols. This test program prints out all the environment variables. If there are any cookies set, there would be a parameter named "HTTP_COOKIE" and its contents would be a list of cookies and their contents shown after the "is set to" phrase.

Sending a cookie via CGI is even more interesting, if even less reliable, but before that can be understood, the presense and nature of HTTP headers at the start of all files obtained over the net using the HTTP protocol is something that must be understood. A server decides a file's type based on the file extension, the letters after the last dot in the file's name. That information and some other things and other possible protocol messages, including error messages such as "server not found" or "access denied," all gets sent within what is called the header, a few lines of information which is sent before the file itself. In CGI, one must explicitly tell the server what kind of file it is so all CGI programs have an opening line or lines, followed by two returns in a row to set it apart from the rest of the file. In the Perl/CGI example seen earlier this page, it is the

print "Content-type: text/html\n\n";

line which specifies that the file following is a text file which contains HTML. It is the exact equivalent of sending a file with the .html or .htm extension. There are also other specifications called for in the HTTP protocol, such as text (the equivalent of a .txt file), cascading style sheets (the equivalent of a .css file), or Graphics Interchange Format (the equivalent of a .gif file), as follows

print "Content-type: text/plain\n\n";
print "Content-type: text/css\n\n";
print "Content-type: image/gif\n\n";

It is in this region that a Set-Cookie command could also go, in order to send a cookie using CGI. Unfortunately, only certain versions of Netscape browsers seem to understand and proccess this HTTP protocol instruction, making it useless for most applications. It is also within this location that something else interesting can be done, and this is thankfully far more universally applied. This is the "Location:" command, which redirects the browser to a specified page. It only takes a single Perl/CGI print command:

print "Location: http://www.the-pope.com/destination_file.html\n\n";

to send the reader to the chosen destination file. It is interesting to see how similar things can be done more than one way. For example, one might have just as well redirected the reader from one .html file to another by using the Javascript program:

<SCRIPT LANGUAGE="JavaScript" TYPE="TEXT/JAVASCRIPT">
<!-- Hide script from old browsers
window.location='http://www.the-pope.com/destination_file.html'
// End hiding script from old browsers -->
</SCRIPT>

but in this case, since some browsers don't implement Javascript (and some people disable their browser's Javascript capability), the Perl/CGI method is clearly the method of preference.

Although I have used images most sparingly on my site, I wanted to expose myself to the techniques of imagemaps, and so I devised a small devotional approach which uses the graphic image used on my main page as the graphic basis for an image map. Once again, there are two basic and distinct ways to implement image maps, one using "client-side" techniques, and the other using "server-side" techniques. HTML version 3.2 introduced the <MAP> and <AREA> tags which feature in all basic client-side imagemaps. Prior to that, the only way to do that on the client-side (reader's PC) was with a Java Applet. Unfortunately I have not as yet succeeded in getting any Java Applet to function correctly on my site, so I cannot display this fascinating intermediate technology which was the first to implement client-side imagemaps.

Before that (and going clear back to HTML version 1), all imagemaps were "server-side" imagemaps. This means that each use of such a map, by clicking on a specific location, required that the coordinates of the position clicked on must be sent to the server, which then figures out what part of the image the mouse click occurred in and from there which "region" it goes in. For this there was the "ISMAP" attribute of the <IMG> tag. The client-side imagemaps use the "USEMAP" attribute, which was also not officially defined until HTML version 3.2. My Resurrection Picture page combines both technologies so as to function correctly and transparently with either older browsers which don't support any client-side imagemaps as well as more modern browsers which do. As test files, I have both a pure server-side version of the same page, so as to demonstrate to those with more modern browsers what server-side imagemaps function like, and a pure client-side version which allows one to see if their browser supports the HTML version 3.2 (or any later) standard. It is this server-side imagemap function which uses the "Location:" function of the CGI/Perl HTTP described above to forward the reader to the file indicated by the location or area on the image. In the <MAP> structure of HTML 3.2 and later, each <AREA> is simply treated as a link.

Further topics to be added some day are: thegrid.net, CGI scripts no longer available, accessibility aspects, language, and style sheets.

Although nearly all pages on my site have hidden counters on them, I show here the current totals for only some of the main few pages:

  • MAIN PAGE:
  • SITE GUIDE:
  • LIBRARY:
  • TOPICAL SUBJECT INDEX:
  • ISOC SITE:
  • QUESTIONS:
  • WHATS NEW:
  • ASK ME:
  • EXTERNAL LINKS:
  • By the way, the Server Side Include command:

    <!--#exec cgi="cgi/cgi_filename.pl" -->
    

    together with a CGI file shown here

    #!/usr/local/bin/perl
    print "Content-type: text/html\n\n";
    open (CNT, "<file_access_count.txt");
    flock(CNT, 2);
    $count = <CNT>;
    flock(CNT, 8);
    close (CNT);
    $count = $count + 1;
    open (CNT, ">file_access_count.txt");
    flock(CNT, 2);
    print CNT $count;
    flock(CNT, 8);
    close (CNT);
    print $count;
    

    which increments a counter and outputs the resulting total as text/html, shows that this .shtml file has been referenced times.


    Return to Main              Next Level Up