Not infrequently, a programmer will be confronted with a directory in which she has created several versions of a program, differing in minor details. If the file more than a few lines long, it can be very difficult to identify the differences by a quick visual survey, so a utility program that carries out such a comparison and reports the differences is often helpful.
Most Unix systems, including ours, provide such a utility program under the name of diff. In this project, we'll look at the internal construction of the diff program, as it might be implemented in Scheme.
Find two files in your home directory that are different versions of the same program. Open a dtterm window and invoke the diff utility to compare the two files; your command should be directed to the shell and should look something like this:
bourbaki% diff frogs.ss frogs-2.ss
(assuming that the two files are named frogs.ss and frogs-2.ss -- substitute the names of your own near-duplicate files). Study the output of the diff command and arrive at an interpretation of it.
Hint: When reporting differences, diff gives the numbers of the lines that differ. It accompanies the line numbers with the letter d when identifying lines that appear only in the first of the two files, a when identifying lines that appear only in the second file, and c when identifying lines that seem to be present in both files but do not match.
The file diff-project.ss contains an implementation in Scheme of a program similar to diff, though with a slightly different output format. Make a copy of this file in your home directory and load it into XEmacs. Read it through.
Start Chez Scheme and invoke the diff procedure to compare the
same files that you previously examined with the Unix diff
utility. Again, interpret the output that is generated.
Adapt the Scheme code so that diff will count lines as identical
if they differ only in the case (capital or lower-case) of the letters they
contain.
In its present form, the Scheme program starts counts the first line of a file as line 0, the second as line 1, and so on. Fix it so that when it reports line numbers to the user, they are one-based rather than zero-based.
Adapt the Scheme code so that diff will count lines as
identical if they differ only in the use of whitespace characters (spaces
and tabs).
Adapt the Scheme code so that any differences in lines that begin with semicolons are ignored.
Starting with a fresh copy of /home/stone/courses/scheme/html/diff-project.ss, adapt the Scheme code so that it compares the two files word by word rather than line by line.
This document is available on the World Wide Web as
http://www.math.grin.edu/courses/Scheme/spring-1998/diff-project.html
created April 5, 1998
last revised June 21, 1998