|CSC 153: Computer Science Fundamentals||Grinnell College||Spring, 2005|
|Laboratory Exercise Reading|
This reading introduces the motivation for using file storage and outlines a general approach for processing files.
Up to this point, all programs developed for this course have stored their data in the computer's main memory. This has the advantage that the computer can process data in main memory more quickly and efficiently that it can if data are stored elsewhere.
On the other hand, storing data in main memory also creates some difficulties. For example, all data must be provided to the program each time the program is run, and the amount of data is limited by the size of main memory. In addition, except for information specified in define statements, any data passed through parameters or computed during procedure execution are destroyed after each procedure is completed executing, and data are not stored from one use of Scheme to the next.
This and the next lab introduce files as a mechanism to overcome some of these disadvantages. Specifically, disk files provide a way to store information in bulk over a long period of time. In applications, we may give up some efficiency, as data on files normally must be read into main memory before the data can be used. Similarly, any results we want stored permanently must be written out explicitly to a disk file. While this storing and retrieving of data requires some specific work, the use of files overcomes the disadvantages mentioned above.
Since the storage of data within files is so important, at least three perspectives have evolved for file processing:
streams: files are considered to contain a sequence or stream of data, such as numbers or characters,
text files: data within files are considered to be organized line-by-line, and
random-access files: data are placed in files, to be accessed in any order.
This lab introduces file streams, which are viewed as sequences of data; the next lab discusses text files. Random-access files constitute a more advanced topic and are beyond the scope of this course.
To introduce some basic ideas of streams, this lab begins by developing solutions for three problems. In each case, we assume that data have been placed in a file using an editor (such as emacs). That is, we assume that we have typed data into an editor and saved the data in a file which we have named. As with other files we edit, we could go back to our editor to revise or expand the data whenever we wanted.
For the problems that follow, the lab first describes a solution by using pseudocode -- a step-by-step outline of what we need to do to solve the problem. Then, the lab develops the Scheme implementations of those solutions.
Find the sum of the numbers in a given file.
The general form for this problem follows a common approach for much file processing. First, we open the file; then, we read and process data; finally, we close the file and print the results. The pseudocode solution that follows adds a few more details:
Here are the built-in Scheme procedures, together with some programming hints, that can help us do various parts of this job:
open-input-fileprocedure takes as argument a string that names an existing, accessible file; it returns a "port" through which data values, such as integers, can be read in from the file.
letinitializes the running total to 0 and initializes the parameter next as the first value read from the file.
readprocedure, which we have previously seen in its zero-argument form as a way of obtaining data from the user at the keyboard, also has a one-argument form: If an input port is supplied as the argument
readreads in, through that port, the next value stored in the file.
readreports this fact by returning a special "end-of-file object" instead of a normal datum. The
eof-object?predicate takes one argument, typically something that
readhas just returned, and itself returns
#tif its argument is the end-of-file object and
close-input-portprocedure takes one argument, an input port, and closes it, freeing the resources used to connect the program to the file.
(define sum-of-file (lambda (source-file-name) ;Pre-condition: source-file-name is the logical name of a file of numbers ;Post-condition: returns sum of numbers in the given file (let ((source (open-input-file source-file-name))) ; Open the file. (let loop ((total 0) ; Initialize the running total. (next (read source))) ; Try to read a number. (if (eof-object? next) ; If you get the end-of-file object, (begin (close-input-port source) ; close the file total) ; and report the final total. (loop (+ next total) ; Otherwise, add the number to ; the running total, (read source))))))) ; try to read another number, ; and repeat the loop.A typical interaction using this procedure would look like this:
> (sum-of-file "/home/walker/151s/labs/file1.dat") 200The file
/home/walker/151s/labs/file1.datcontains the four numbers
50 50 75 25
In reviewing this processing, note that the data file was viewed as containing a sequence or stream of numbers. Processing data in this file then involved reading the numbers, value-by-value, until we reached the end of the file.
Copy one file to another. That is, create a second file which is character-by-character identical to the first.
This time we view the file as a sequence of characters. The pseudocode solution to this problem is:
Scheme contains procedures read-char and write-char for processing one character, just as procedures read and write or display handle numbers or symbols or strings or Booleans or lists. That is, read-char and write-char transfer a single character from or to the file. As with the read procedure, read-char returns an end-of-file object when it comes to the end of a file.
Just as we used the open-input-file and close-input-port when reading from a file, we use procedures open-output-file and close-output-port when writing to a file.
The Scheme implementation of the above outline follows.
(define copy-file (lambda (source-file-name target-file-name) ;Pre-condition: source-file-name is the logical name of a file ;Post-condition: copies contents of source file to target file (let ((source (open-input-file source-file-name)) (target (open-output-file target-file-name))) (let loop ((ch (read-char source))) (if (eof-object? ch) (begin (close-input-port source) (close-output-port target)) (begin (write-char ch target) (loop (read-char source))))))))
The following problem may be useful in estimating the length of a paper, as one rule of thumb states that a typical page of double spaced manuscript contains about 12 words per line and about 27 usable lines (excluding margins) for a total of about 324 words.
To approach this problem, we need a working definition of a "word". One possibility is that a word is any sequence of character that begins with a letter and ends with whitespace (e.g., a space, a tab, or a newline).
To count words, we read a file one character at a time. When we first encounter a letter, we have another word. We then skip over additional letters (or non-white space) as part of that word. One possible outline for such a solution follows:
Since the finding the start or the end of a word requires checking various details, we write steps 2A and 2Bii as separate, mutually-recursive, local procedures. Thus, find-start-word reads character-by-character until finding the start of the word (step 2A), while find-end-word reads character-by-character until the end of the current word (step 2Bii). All of step 2 is accomplished by having these two procedures call each other. The resulting code follows:
(define count-words (lambda (source-file-name) ;Pre-condition: source-file-name is the logical name of a file ;Post-condition: returns number of words in the given file (letrec ((source (open-input-file source-file-name)) (print-result (lambda (count) (display "File ") (display source-file-name) (display " contains ") (display count) (display " words.") (newline) )) (find-start-word (lambda (next-char count) (cond ((eof-object? next-char) (begin (close-input-port source) (print-result count) )) ((char-alphabetic? next-char) (find-end-word (read-char source) (+ 1 count))) (else (find-start-word (read-char source) count)) ) )) (find-end-word (lambda (next-char count) (cond ((eof-object? next-char) (begin (close-input-port source) (print-result count) )) ((char-whitespace? next-char) (find-start-word (read-char source) count)) (else (find-end-word (read-char source) count)) ) )) ) (find-start-word (read-char source) 0) ) ) )
This document is available on the World Wide Web as
material for Problems 1 and 2 created in two labs on March 11, 1997
by John D. Stone
material merged and reorganized April 5, 1999 by Clif Flynt and Henry M. Walker
last revised February 7, 2005 by Henry M. Walker
|For more information, please contact Henry M. Walker at email@example.com.|