Laboratory Exercises For Computer Science 153

An Introduction to File Processing

An Introduction to File Processing

Goals: This laboratory exercise applies general approaches for processing files to some specific problems.

The lab begins by developing solutions for two problems. In each case, a solution is given in pseudocode -- a step-by-step outline of what we need to do to solve the problem. Then, the lab develops the Scheme implementations of those solutions.

Problem 1: Find the sum of the numbers in a given file.

The general form for this problem follows a common approach for much file processing. First, we open the file; then, we read and process data; finally, we close the file and print the results. The pseudocode solution that follows adds a few more details:

  1. Open the file containing the numbers.
  2. Initialize a running total to 0.
  3. Until the end of the file is reached:
    1. Read in the next number from the file.
    2. Add it to the running total.
  4. Close the file.
  5. Report the final value of the running total.

Here are the built-in Scheme procedures, together with some programming hints, that can help us do various parts of this job:

Here is the Scheme implementation of the pseudocode:
(define sum-of-file
  (lambda (source-file-name)
    (let ((source (open-input-file source-file-name)))
                                        ; Open the file.
      (let loop ((total 0)              ; Initialize the running total.
                 (next (read source)))  ; Try to read a number.
        (if (eof-object? next)          ; If you get the end-of-file object,
            (begin
              (close-input-port source) ; close the file
              total)                    ; and report the final total.
            (loop (+ next total)        ; Otherwise, add the number to
                                        ;    the running total,
                  (read source)))))))   ; try to read another number,
                                        ;    and repeat the loop.
A typical interaction using this procedure would look like this:
> (sum-of-file "/home/walker/153/labs/data.1")
200
The file /home/walker/153/labs/data.1 contains the four numbers
50
50
75
25
  1. Compare this program with the pseudocode solution given earlier. Identify how each part of the pseudocode is reflected in the Scheme code.

  2. Trace this program for the data file given. Be sure you can explain how it works.

  3. Explain why the expression (read source) appears twice in the above code. What is the purpose of each appearance of this expression?

Problem 2: Given a file containing numbers, with one or more numbers on each line, compute the average of the numbers on each line and write it in a new file. (In other words, the new file should contain one number for each line of the given file -- the average of the numbers on that line.)

The pseudocode solution to this problem is:

  1. Open the input file.
  2. Open the output file.
  3. Until the end of the file is reached:
    1. Initialize a running total of the numbers on the current line to 0.
    2. Initialize a tally of those numbers to 0.
    3. Until the end of a line is reached:
      1. Read in a number from the input file.
      2. Add it to the running total.
      3. Add 1 to the tally.
    4. Compute the average.
    5. Write it to the output file.
  4. Close the input file.
  5. Close the output file.

To detect the end of a line in Scheme, we need a procedure that has not yet been introduced: peek-char. The peek-char procedure takes one argument, an input port, and returns the first unread character that can be accessed through that port. It does not actually read that character or extract it from the port -- it just peeks at it to see what it will be when and if it is (subsequently) read in.

If the value returned by (peek-char source) is the newline character, #\newline, then we know that we're at the end of the line and can proceed to average the numbers that we've encountered. (This is also a good time to read in and discard the newline character, so that we can start into the next line of numbers without encountering it again.)

Like (read source) and (read-char source), (peek-char source) returns the end-of-file object if there are no more characters in the file. Here's a procedure that manages the inner loop of the pseudocode shown above. It reads in one line of numbers from the input port source and writes their average to the output port.

(define average-line
  (lambda (source target)
    (let loop2 ((total 0)                   ; Initialize the running total
                (tally 0)                   ; Initialize the tally.
                (ch (peek-char source)))    ; Peek at the next character.
      (if (char=? ch #\newline)             ; If it's a newline character,
          (begin
            (read-char source)              ; discard it,
            (write (/ total tally) target)  ; compute the average and write
                                            ;    it to the target file,
            (newline target))               ; and terminate the line in the
                                            ;    target file.
          (let ((next (read source)))       ; Otherwise, read a number.
            (loop2 (+ total next)           ; Add it to the running total.
                   (+ tally 1)              ; Add 1 to the tally.
                   (peek-char source))))))) ; Peek at the next character
                                            ;    and repeat the loop.

The Scheme implementation of the solution to problem #2 is now easy to write: Open up the files, call average-line once for each line of the input file, and finally close the files:

(define average-each-line
  (lambda (source-file-name target-file-name)
    (let ((source (open-input-file source-file-name))
                                               ; Open the input file.
          (target (open-output-file target-file-name)))
                                               ; Open the output file.
      (let loop1 ((ch (peek-char source)))     ; Peek at the next character.
        (if (eof-object? ch)                   ; If you get the eof-object,
            (begin
              (close-input-port source)        ; close the input file
              (close-output-port target))      ; and the output file.
            (begin                             ; Otherwise,
              (average-line source target)     ; read the numbers on one
                                               ;    line and compute and
                                               ;    write their average.
              (loop1 (peek-char source)))))))) ; Peek at the next character
                                               ;    and repeat the loop.

Here's what a typical invocation of this procedure looks like:

> (average-each-line "/home/walker/153/labs/data.2" "lab-2.output")
Nothing shows up on screen, because the last operation performed by average-each-line is the call to close-output-port, which returns an unspecified value (and Chez Scheme doesn't bother to print unspecified values). All the action takes place off stage, in the files: If the input file contains two lines of numbers -- say, for instance,
25 100 50 50 200
50 25 75
-- then the program will create the output file lab-2.output, looking like this:
85
50
but the creation of this file is invisible to the interactive Scheme user.

  1. After running the procedure as shown above, look at the data printed in lab-2.output to be sure it contains what you expect. This might be done in either two ways: you could open the file with an editor such as XEmacs or you could view the file in an dtterm window with the cat or more commands.
  2. Explain how each part of the pseudocode is reflected in the Scheme code.

  3. Trace this program for the data file given. Be sure you can explain how it works.

  4. Explain in your own words what peek-char does and why it is used here.


This document is available on the World Wide Web as

http://www.math.grin.edu/~walker/courses/153/lab-file-intro.html

created March 11, 1997 by John D. Stone
last revised January 29, 1998 by Henry M. Walker