Project: Programming the Common Gateway Interface

As some of you may know, the documents for this course that are accessible on the World Wide Web are simply text files, written using XEmacs, that include some additional text that is not usually displayed by Netscape Navigator. This additional text is called ``markup,'' and it gives the browser some information about the structure of the document -- where to put paragraph breaks and horizontal rules, which parts of the text should be treated as samples of Scheme source code, and so on.

Writing up the entire document in advance and storing it in a file is fine for many applications, but it means that the document is inert. Whenever the document is accessed, exactly the same text (and the accompanying markup) will be transferred. Sometimes, however, one would like to modify the text of the document on the basis of information that one finds out about when it is accessed. When a browser accesses a World Wide Web document, it is required to supply quite a bit of information to the server (the computer on which the document is stored). To take advantage of this information, one writes a program that the server can run when it is asked for the document. The program should construct and write out the text of the document, including any markup that it needs; the server collects this output and forwards it to the browser, which displays it. The advantage is that the program can examine information supplied by the browser and write out different text depending on what it learns.

The Common Gateway Interface (CGI) is a set of conventions, supported by software, that facilitate the writing of programs that generate World Wide Web documents. CGI programs can be written in almost any programming language; naturally, though, we'll use Scheme.

Here is what an inert document intended for the World Wide Web looks like when it is stored in a file. Each little piece of markup is enclosed between less-than and greater-than signs; this is a convention of the Hypertext Markup Language (HTML) that is used. (Don't be distracted by the actual text in this example. If you must know, it's a quotation from Kant's Critique of pure reason. Feel free to substitute any preferred text of your own.)

<html>
<head>
<title>Sample HTML page</title>
</head>

<body>
<p>
If by merely intelligible objects we mean those
things which are thought through pure categories,
without any schema of sensibility, such objects
are impossible.  For the condition of the
objective employment of all our concepts of
understanding is merely the mode of our sensible
intuition, by which objects are given us; if we
abstract from these objects, the concepts have no
relation to any object.  Even if we were willing
to assume a kind of intuition other than this our
sensible kind, the functions of our thought would
still be without meaning in respect to it.
<p>
</body>
</html>

Follow this link to see how this page looks in your browser.

Here's the explanation for the various pieces of markup: The <html> at the very beginning and the </html> form a pair of tags; they indicate that any markup placed between them is to be understood as HTML. Another pair of tags, <body> and </body>, enclose the text that is actually to be displayed in the interior of the browser window; <head> and </head> enclose ``header'' information that the browser may or may not choose to display. The paired tags <title> and </title> within the header enclose the title of the document; Netscape Navigator displays this title at the top of the frame of the browser window. Finally, the paired tags <p> and </p> mark the beginning and the end of a paragraph.

One other pair of tags, not used in the preceding document but frequently helpful, is <pre> and </pre>. Any text placed between these tags is considered ``preformatted''; the browser is not supposed to change the spacing of the lines or the positions of the breaks between lines. This is handy for printing the source code of a computer program or other text that is carefully laid out before the browser gets it.

Suppose, now, that we wanted a Scheme program to generate the ``Sample HTML page'' document, so that the server would run the program instead of recovering the document from a file. All we would need to do is ask Scheme to print out the appropriate strings and halt. The program looks like this:

(display "<html>")
(newline)
(display "<head>")
(newline)
(display "<title>Sample HTML page</title>")
(newline)
(display "</head>")
(newline)
(newline)
(display "<body>")
(newline)
(display "<p>")
(newline)
(display "If by merely intelligible objects we mean those")
(newline)
(display "things which are thought through pure categories,")
(newline)
(display "without any schema of sensibility, such objects")
(newline)
(display "are impossible.  For the condition of the")
(newline)
(display "objective employment of all our concepts of")
(newline)
(display "understanding is merely the mode of our sensible")
(newline)
(display "intuition, by which objects are given us; if we")
(newline)
(display "abstract from these objects, the concepts have no")
(newline)
(display "relation to any object.  Even if we were willing")
(newline)
(display "to assume a kind of intuition other than this our")
(newline)
(display "sensible kind, the functions of our thought would")
(newline)
(display "still be without meaning in respect to it.")
(newline)
(display "</p>")
(newline)
(display "</body>")
(newline)
(display "</html>")
(newline)
(exit)

To make this Scheme program executable by the document server, so that the document that it outputs will be available on the World Wide Web, we need to take the following steps:

  1. Before anything in your MathLAN account can be accessed on the Web, you must make your home directory accessible. To do this, open an hpterm window and give the command

    chmod 755 ~
    

    at the prompt. (The symbol ~ stands for your home directory.)

  2. Any materials related to the World Wide Web belong in a subdirectory of your home directory named public_html. If you have no such subdirectory, create one by giving the command

    mkdir ~/public_html
    

    in the hpterm window. This directory, too, must be accessible; give the command

    chmod 755 ~/public_html
    

    to make it so.

  3. CGI programs, in particular, belong in a subdirectory of ~/public_html named cgi-bin. If you have no such subdirectory, create it with the command

    mkdir ~/public_html/cgi-bin
    

    and make it accessible with the command

    chmod 755 ~/public_html/cgi-bin
    
  4. Using XEmacs, open a file for the CGI program. A good name for the file would be ~/public_html/cgi-bin/sample.cgi -- notice that the file name ends in .cgi rather than .ss when we expect to use the Common Gateway Interface.

  5. In XEmacs, type the following line at the beginning of the file:

    #!/usr/local/bin/scm -q
    

    Don't leave any blank space before the # character.

    The effect of this line is to tell the server what software is needed to process the program -- in this case, the SCM Scheme interpreter. (SCM is another implementation of Scheme. We're using it for this project instead of Chez Scheme because, even though it's much slower than Chez Scheme, its interface is better suited to CGI programming, and it provides a non-standard built-in procedure that we'll be needing shortly.)

  6. Still inside XEmacs, add the following lines below the one beginning with the # character:

    (display "Content-type: text/html")
    (newline)
    (newline)
    

    When Scheme writes out these first two lines, the server will forward them to the browser, and the browser will get ready to display an HTML text document.

  7. Still in XEmacs, copy the Scheme program shown above into the file and save it. Moving back to the hpterm window, make the newly saved file accessible by giving the command

    chmod 755 ~/public_html/cgi-bin/sample.cgi
    
  8. Congratulations -- you've written your first CGI program in Scheme! To see it in action, ask Netscape to load the URL

    http://www.yourserver.edu/~yourname/cgi-bin/sample.cgi
    

Now for something more interesting. The Common Gateway Interface ensures that the information that the browser gives to the server when it requests a document can be recovered by a CGI program. On our system, using SCM, the particular mechanism through which this information is transferred is a non-standard, predefined procedure named getenv, which takes a string as its argument and returns another string as its value. Three of the strings that can be used as arguments are of particular interest for our purposes:

Here's a CGI program, written in Scheme, that uses all three of these:

#!/usr/local/bin/scm -q

(define writeln
  (lambda args
    (for-each display args)
    (newline)))

(writeln "Content-type: text/html")
(writeln)

(writeln "<html>")
(writeln "<head>")
(writeln "<title>I know where you are!</title>")
(writeln "</head>")

(writeln "<body>")
(writeln "<p>")
(writeln "You're at " (getenv "REMOTE_HOST") ", aren't you?")
(writeln "<p>")
(writeln "And you're running " (getenv "HTTP_USER_AGENT")
         " as your browser!")

(let ((query-string (getenv "QUERY_STRING")))
  (if (and query-string
           (not (string=? query-string "")))
    (begin
      (writeln "<p>")
      (writeln "And I say ``" query-string "'' to you, too!"))))

(writeln "</body>")
(writeln "</html>")

(exit)

Follow this link to see the document that this program generates.

The procedure call (getenv "QUERY_STRING") looks at the URL that the browser used to activate the CGI program. If that URL just ends in .cgi, then (getenv "QUERY_STRING") returns #f. However, if there is a question mark after the .cgi, and a string of characters after that, then (getenv "QUERY_STRING") returns the characters in this additional string. The user can supply almost any kind of information to the CGI program through the query string, and the CGI program recovers it by decoding and parsing that string. (Follow this link to see what the program shown above generates when given the query string "phooey".)

In CGI programming, a query string usually consists of a sequence of equations separated by ampersands, with some attribute on the left-hand side of each equation and the value of that attribute on the right-hand side. For instance a typical query string might be "year=1997&month=4"; here the attributes are "year" and "month" and their respective values are "1997" and "4". Note that both the attributes and the values are strings.

Because the user often wants to supply attribute values that contain spaces, slashes, question marks, or other special characters that would wreak havoc if attached to URLs, CGI requires that such characters be encoded. The conventional encoding is to replace each space with a plus sign and each special character with a sequence of three characters beginning with a percent sign. The CGI program is expected to decode the strings recovered from the query string. This is usually done with the help of some ``library routine'' -- a procedure that someone else has written. In this lab, you'll find it convenient to use the extract-attributes procedure in cgi-utilities.scm, which takes a query string as argument and returns a list of pairs, with the car in each pair being a fully decoded attribute and the corresponding cdr being its fully decoded value:

(extract-attributes "year=1917&month=3&song=Over+There")
===> (("year" . "1917") ("month" . "3") ("song" . "Over There"))

Now we're ready to look at the project. The file display-calendar.ss contains a program that culminates in the definition of a procedure, display-calendar, that takes two arguments, a month number and a year, and displays a calendar for that month:

> (display-calendar 1 2000)
January 2000

 Su  M Tu  W Th  F Sa
                    1
  2  3  4  5  6  7  8
  9 10 11 12 13 14 15
 16 17 18 19 20 21 22
 23 24 25 26 27 28 29
 30 31

The project is to use this program as the basis for a CGI program that generates an HTML document containing the calendar for a specified month. The idea is that when a browser submits a URL such as

http://www.yourserver.edu/~yourname/cgi-bin/display-calendar.cgi?year=1951&month=7

the server should invoke your CGI program, which should generate the HTML for a page that looks like this one.

Your part of the job is to design, write, and test additional Scheme code that can be added at the end of the display-calendar program to recover the year and month from the query string in the URL, print out the necessary markup at the beginning and end of the document, and invoke display-calendar in exactly the right place and with the appropriate arguments. Once this code is added, you can convert the program into a CGI program by adding

#!/usr/local/bin/scm -q

(display "Content-type: text/html")
(newline)
(newline)

at the top, saving the result in ~/public_html/cgi-bin/display-calendar.cgi, and making that file accessible. At that point, your program will be ready to respond to requests from anyone on the Internet who needs to know what day of the week February 23, 2007 falls on.

You may find it helpful to look at a couple of hints on testing and debugging this program.


This document is available on the World Wide Web as

http://www.math.grin.edu/courses/Scheme/spring-1997/cgi-project.html

created April 16, 1997
last revised May 28, 1997
John David Stone (stone@math.grin.edu)