Project: Differences between files

Not infrequently, a programmer will be confronted with a directory in which she has created several versions of a program, differing in minor details. If the file more than a few lines long, it can be very difficult to identify the differences by a quick visual survey, so a utility program that carries out such a comparison and reports the differences is often helpful.

Most Unix systems, including ours, provide such a utility program under the name of diff. In this project, we'll look at the internal construction of the diff program, as it might be implemented in Scheme.


Step 1

Find two files in your home directory that are different versions of the same program. Open a dtterm window and invoke the diff utility to compare the two files; your command should be directed to the shell and should look something like this:

                bourbaki% diff frogs.ss frogs-2.ss

(assuming that the two files are named frogs.ss and frogs-2.ss -- substitute the names of your own near-duplicate files). Study the output of the diff command and arrive at an interpretation of it.

Hint: When reporting differences, diff gives the numbers of the lines that differ. It accompanies the line numbers with the letter d when identifying lines that appear only in the first of the two files, a when identifying lines that appear only in the second file, and c when identifying lines that seem to be present in both files but do not match.


Step 2

The file diff-project.ss contains an implementation in Scheme of a program similar to diff, though with a slightly different output format. Make a copy of this file in your home directory and load it into XEmacs. Read it through.


Step 3

Start Chez Scheme and invoke the diff procedure to compare the same files that you previously examined with the Unix diff utility. Again, interpret the output that is generated.


Step 4

Adapt the Scheme code so that diff will count lines as identical if they differ only in the case (capital or lower-case) of the letters they contain.


Step 5

In its present form, the Scheme program starts counts the first line of a file as line 0, the second as line 1, and so on. Fix it so that when it reports line numbers to the user, they are one-based rather than zero-based.


Step 6

Adapt the Scheme code so that diff will count lines as identical if they differ only in the use of whitespace characters (spaces and tabs).


Step 7

Adapt the Scheme code so that any differences in lines that begin with semicolons are ignored.


Step 8

Starting with a fresh copy of /home/stone/courses/scheme/html/diff-project.ss, adapt the Scheme code so that it compares the two files word by word rather than line by line.


This document is available on the World Wide Web as

http://www.math.grin.edu/courses/Scheme/spring-1998/diff-project.html

created April 5, 1998
last revised June 21, 1998

John David Stone (stone@math.grin.edu)