Project: Constructing World Wide Web documents

In this project, we'll look at a Scheme program that automatically generates documents that can be made available on the World Wide Web and displayed by web browsers (such as Netscape, Lynx, and Internet Explorer).

Specifically, the program produces several documents like this list of some of Stephen Jay Gould's books. Each of the documents provides bibliographical information about one or more books by a specific author.

A World Wide Web document is stored as a text file, and the Scheme program just figures out what should be written to the text file containing each document, opens that file, writes out the appropriate text, and closes the port to the file. There's only one hitch: In addition to the text to be displayed, a World Wide Web document must contain instructions to the browser about how the document is to be structured and presented. These instructions are in the form of markup tags -- short sequences of characters that are interspersed with the text. In preparing a document for display, a browser strips out these markup tags and follows the instructions they contain to present all the parts of the document in an appropriate format and layout.

The same markup tags are used for most World Wide Web documents; these standard tags constitute the Hypertext Markup Language (HTML). So the Scheme program that generates World Wide Web documents must observe the conventions of HTML and produce files in which markup tags are used consistently and correctly, just as a human author would.

The file /u2/stone/courses/scheme/html/book-pages.ss contains the Scheme program that generates the bibliographical documents. Comments in that program explain how HTML tags work and introduce all those that are used in the documents. You'll be pleased to know that for once you have a complete and correct working version of the program to start from; the project is to extend this program, adding new features to it.

A World Wide Web site is a directory containing the documents that one wishes to make available. On MathLAN, any user can establish such a site by issuing the following commands in a dtterm window:

mkdir ~/public_html
chmod 755 ~ ~/public_html

The first of these commands creates a directory in which the machine that responds to requests for World Wide Web documents will look.to find any documents that you choose to put there. The second ensures that anyone can examine the list of documents available in that directory (and in your home directory).

  1. If you don't already have a World Wide Web site in your account, establish one.

  2. Use the cd (``change directory'') command to make it your current working directory:

    cd ~/public_html
    
  3. Start Scheme in the dtterm window and run the book-pages.ss program in it, either by naming the file containing that program on the command line or by using the load procedure after Scheme has started. The program shuts down Scheme after it completes its run, so don't be surprised when you get the shell prompt back.

  4. Type ls to list the files in the public_html directory; you'll see that the program generated several of them, all World Wide Web documents. (Initially, these newly created documents are not yet public, even though they are listed in your World Wide Web site; to release the document stored in a file named, say, Gould.html, you would give the command

    chmod 644 Gould.html
    

    in the dtterm window.)

  5. Study the source code for the program and the contents of the text files it created. Figure out the process by which the program created the files.

  6. Grinnell College requires that every World Wide Web document distributed through its servers include the name of the author and the date on which the document was last modified. A simple way to do this is to insert the following material just before the </body> tag near the end of the document:

    <hr>
    <p>
    George Spelvin<br>
    November 7, 1997
    </p>
    

    You would of course substitute your own name and the current date for those shown in the example.

    The <hr> tag draws a horizontal rule across the document. The <p> and </p> tags make the name and the date into a paragraph. The <br> tag directs the browser to start a new line after the name. The result will look like this:


    George Spelvin
    November 7, 1997

    Adapt the program so that it prints your name and today's date automatically at the bottom of each of the documents it constructs.

    You may want to study the way the browser displays the files produced by your version of the program. To view them through Netscape Navigator, move them into your ~/public_html directory (if they are not already there), then release each one with a chmod 644 command, as shown in step 4. Now they are part of the Web. Move the mouse pointer into your Netscape Navigator window and select Open Location from the File menu. In the pop-up window that appears, type the location of your document:

    http://www.math.grin.edu/~spelvin/Gould.html
    

    (Substitute your username for Spelvin's and the file name of the document you want to view for Gould.html.)

  7. It would be handy to have the program automatically generate an index document, traditionally stored in the file index.html, containing a hypertext link to each of the bibliographical files generated. Here is what the displayed text in the body of such a file might look like, with the necessary HTML tags included:

    <ul>
    <li><a href="Sagan.html">Sagan, Carl</a></li>
    <li><a href="Hofstadter.html">Hofstadter, Douglas R.</a></li>
    <li><a href="Commoner.html">Commoner, Barry</a></li>
    <li><a href="Smullyan.html">Smullyan, Raymond</a></li>
    <li><a href="Gould.html">Gould, Stephen Jay</a></li>
    <li><a href="Gardner.html">Gardner, Martin</a></li>
    </ul>
    

    And here is how the same text would look in the browser:

    Add commands to the program so that it constructs and writes out an index document for the bibliography documents that it produces.

  8. After you finish the project, if you no longer wish to maintain a World Wide Web site, you can delete files from the ~/public_html directory with the command

    rm -i ~/public_html/*.html
    

    or remove the entire directory and all of its contents with

    cd ; rm -i -r public_html
    

    Alternatively, now that you have a World Wide Web site, you can start creating your own documents for it. Since World Wide Web documents are text files, you can create them with XEmacs, typing in the necessary markup tags manually -- or, if you find it more interesting, you can revise book-pages.ss to make a Scheme program that will write your pages for you.

    Still another possibility is to use a high-level word processor like the one that can be activated through the Edit menu in the Netscape browser. Users of such programs never see the markup; in response to the document writer's instructions, the word processor inserts markup tags into the file that it creates, but does not display them on screen.

    To learn more about HTML, I recommend the primer A beginner's guide to HTML, currently maintained by Marty Blase of the National Center for Supercomputing Applications.


This document is available on the World Wide Web as

http://www.math.grin.edu/courses/Scheme/fall-1997/html-project.html

created November 5, 1997
last revised November 6, 1997

John David Stone (stone@math.grin.edu)