CSC 153 Grinnell College Spring, 2005
 
Computer Science Fundamentals
Laboratory Exercise
 

An Introduction to Files

Goals

This laboratory exercise provides experience with the basic elements of processing files, when the files are viewed as streams of data.

Some Motivations for File Storage

Problem 1:

Find the sum of the numbers in a given file.

Solution Outline:

The outline that follows shows a pseudocode solution to this problem:

  1. Open the file containing the numbers.
  2. Initialize a running total to 0.
  3. Until the end of the file is reached:
    1. Read in the next number from the file.
    2. Add it to the running total.
  4. Close the file.
  5. Report the final value of the running total.

Solution in Scheme:

Here is the Scheme implementation of the pseudocode:

(define sum-of-file
  (lambda (source-file-name)
  ;Pre-condition:  source-file-name is the logical name of a file of numbers
  ;Post-condition:  returns sum of numbers in the given file
    (let ((source (open-input-file source-file-name)))
                                        ; Open the file.
      (let loop ((total 0)              ; Initialize the running total.
                 (next (read source)))  ; Try to read a number.
        (if (eof-object? next)          ; If you get the end-of-file object,
            (begin
              (close-input-port source) ; close the file
              total)                    ; and report the final total.
            (loop (+ next total)        ; Otherwise, add the number to
                                        ;    the running total,
                  (read source)))))))   ; try to read another number,
                                        ;    and repeat the loop.
A typical interaction using this procedure would look like this:
> (sum-of-file "/home/walker/151s/labs/file1.dat")
200
The file /home/walker/151s/labs/file1.dat contains the four numbers
50
50
75
25
  1. Compare this program with the pseudocode solution given earlier. Identify how each part of the pseudocode is reflected in the Scheme code.

  2. Trace this program for the data file given. Be sure you can explain how it works.

  3. Explain why the expression (read source) appears twice in the above code. What is the purpose of each appearance of this expression?

In reviewing this processing, note that the data file was viewed as containing a sequence or stream of numbers. Processing data in this file then involved reading the numbers, value-by-value, until we reached the end of the file.

Character-by-Character Processing:

Problem 2:

Copy one file to another. That is, create a second file which is character-by-character identical to the first.

Solution in Scheme:

The following Scheme code provides a solution of this problem.

(define copy-file
  (lambda (source-file-name target-file-name)
  ;Pre-condition:  source-file-name is the logical name of a file
  ;Post-condition:  copies contents of source file to target file
    (let ((source (open-input-file source-file-name))
          (target (open-output-file target-file-name)))
      (let loop ((ch (read-char source)))
        (if (eof-object? ch)
            (begin
              (close-input-port source)
              (close-output-port target))
            (begin
              (write-char ch target)
              (loop (read-char source))))))))
  1. Check that this procedure works as claimed by using it to copy the file /home/walker/151s/labs/file2.dat to a file named lab.data in your account.

  2. Modify this procedure so that every lower-case letter that is read in is converted to upper case before being written to the output file.

  3. Write a Scheme procedure tally-char that takes two arguments, the name of an input file and a character, and returns a tally of the number of occurrences of that character in the specified file.

    (tally-char "/home/walker/151s/labs/file1.dat" #\5) ===> 4
    (tally-char "/home/walker/151s/labs/file2.dat" #\0) ===> 16
    (tally-char "/home/walker/151s/labs/file2.dat" #\newline) ===> 3
    

    Hint: Within a main loop, add a parameter to contain the desired count, and update the count appropriately in any recursive call(s).

  4. Assume that a sentence is any sequence of characters ending with a period, question mark, or exclamation point. Modify tally-char, to get a procedure count-sentences, which determines the number of sentences in a file.

    File /home/walker/151s/labs/lab-file-description contains the first paragraph of introductory material in this lab, starting "Up to this point, ...". Check that this paragraph contains 2 sentences:

    (count-sentences "/home/walker/151s/labs/lab-file-description") 
         ===> 2
    
  5. Write and test a Scheme procedure that takes two arguments -- the name of an input file containing zero or more integers, and the name of an output file to be created by the procedure -- and copies each integer from the input file to the output file if it is in the range from 0 to 99. Values outside of this range should be read in but not copied out again. The idea is that this procedure will act as a filter, ensuring that only the values that are in the correct range will make it into the output file.

    Line breaks in the input file should be ignored. In the output file, arrange for each integer to be printed on a line by itself.

Counting Words in a File:

Problem 3:

Approximate the number of words in a file.

Solution in Scheme:

The following code solves this problem by reading the file word by word:

(define count-words
   (lambda (source-file-name)
   ;Pre-condition:  source-file-name is the logical name of a file
   ;Post-condition:  returns number of words in the given file
      (letrec ((source (open-input-file source-file-name))
               (print-result
                  (lambda (count)
                      (display "File ")
                      (display source-file-name)
                      (display " contains ")
                      (display count)
                      (display " words.")
                      (newline)
                  ))
               (find-start-word 
                  (lambda (next-char count)
                     (cond ((eof-object? next-char) 
                                (begin
                                    (close-input-port source)
                                    (print-result count)
                                ))
                           ((char-alphabetic? next-char) 
                                (find-end-word (read-char source) (+ 1 count)))
                           (else 
                                (find-start-word (read-char source) count))
                     )
                  ))
               (find-end-word 
                  (lambda (next-char count)
                     (cond ((eof-object? next-char)
                                (begin
                                    (close-input-port source)
                                    (print-result count)
                                ))
                           ((char-whitespace? next-char) 
                                (find-start-word (read-char source) count))
                           (else 
                                (find-end-word (read-char source) count))
                     )
                  ))
               )
          (find-start-word (read-char source) 0)
      )
   )
)
  1. Check that this program works by running it on the file /home/walker/151s/labs/lab-file-description which contains the first paragraph of introductory material in this lab, starting "Up to this point, ...". (The paragraph contains 44 words.)

  2. Describe in several sentences how this program works.

    1. What is the purpose of each local procedure?
    2. Why is a letrec used, rather than a named let expression.
    3. find-start-word contains several parameters. What is the purpose of each?
    4. Procedure names normally should be chosen to describe what the procedure does. Do you think the name find-start-word is sufficiently descriptive? If so, explain why. If not, suggest a better name.
  3. count-words is incomplete, in that it contains few comments beyond pre- and post-conditions. Add appropriate commentary to clarify the purpose of each of main part of the code.

  4. Modify count-words so that a word is considered to be only a sequence of letters. That is, for this part, a word is a sequence of letters -- without punctuation or digits.

  5. Challenge Problem: Use the ideas of count-words and count-sentences to write procedure average-words, which determines the average number of words in a sentence. Note that for efficiency, average-words should only read through the file once.


This document is available on the World Wide Web as

http://www.cs.grinnell.edu/~walker/courses/153.sp05/labs/lab-file-intro.shtml

material for Problems 1 and 2 created in two labs on March 11, 1997 by John D. Stone
material merged and reorganized April 5, 1999 by Clif Flynt and Henry M. Walker
last revised February 7, 2005 by Henry M. Walker
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.