On the other hand, storing data in main memory also creates some difficulties. For example, all data must be provided to the program each time the program is run, and the amount of data is limited by the size of main memory. In addition, except for information specified in define statements, any data passed through parameters or computed during procedure execution are destroyed after each procedure is completed executing, and data are not stored from one use of Scheme to the next.
This lab introduces files as a mechanism to overcome some of these desadvantages. Specifically, disk files provied a way to store information in bulk over a long period of time. In applications, we may give up some efficiency, as data on files normally must be read into main memory before the data can be used. Similarly, any results we want stored permanently must be written out explicitly to a disk file. While this storing and retrieving of data requires some specific work, the use of files does overcome the disadvantages mentioned above.
To introduce some basic ideas, this lab begins by developing solutions for two problems. In each case, we assume that data have been placed in a file using an editor (such as xemacs). That is, we assume that we have typed data into an editor and saved the data in a file which we have named. As with other files we edit, we could go back to our editor to revise or expand the data whenever we wanted.
For the problems that follow, the lab first describes a solution by using pseudocode -- a step-by-step outline of what we need to do to solve the problem. Then, the lab develops the Scheme implementations of those solutions.
Problem 1: Find the sum of the numbers in a given file.
Solution Outline: The general form for this problem follows a common approach for much file processing. First, we open the file; then, we read and process data; finally, we close the file and print the results. The pseudocode solution that follows adds a few more details:
Solution in Scheme: Here are the built-in Scheme procedures, together with some programming hints, that can help us do various parts of this job:
open-input-file procedure takes as argument a string
that names an existing, accessible file; it returns a ``port'' through
which data values, such as integers, can be read in from the file.
let expression.
let initializes the running total
to 0 and initializes the parameter next as the first value read
from the file.
read procedure, which we have previously seen in its
zero-argument form as a way of obtaining data from the user at the
keyboard, also has a one-argument form: If an input port is supplied as the
argument read reads in, through that port, the next value
stored in the file.
read reports
this fact by returning a special ``end-of-file object'' instead of a normal
datum. The eof-object? predicate takes one argument,
typically something that read has just returned, and itself
returns #t if its argument is the end-of-file object and
#f otherwise.
close-input-port procedure takes one argument, an
input port, and closes it, freeing the resources used to connect the
program to the file.
(define sum-of-file
(lambda (source-file-name)
(let ((source (open-input-file source-file-name)))
; Open the file.
(let loop ((total 0) ; Initialize the running total.
(next (read source))) ; Try to read a number.
(if (eof-object? next) ; If you get the end-of-file object,
(begin
(close-input-port source) ; close the file
total) ; and report the final total.
(loop (+ next total) ; Otherwise, add the number to
; the running total,
(read source))))))) ; try to read another number,
; and repeat the loop.
A typical interaction using this procedure would look like this:
> (sum-of-file "/home/walker/151s/labs/file1.dat") 200The file
/home/walker/151s/labs/file1.dat contains the
four numbers
50 50 75 25
(read source) appears twice in the
above code. What is the purpose of each appearance of this expression?
Problem 2: Given a file containing numbers, with one or more numbers on each line, compute the average of the numbers on each line and write it in a new file. (In other words, the new file should contain one number for each line of the given file -- the average of the numbers on that line.)
Solution Outline: The pseudocode solution to this problem is:
Solution in Scheme:
To detect the end of a line in Scheme, we need a procedure that has not yet
been introduced: peek-char. The peek-char
procedure takes one argument, an input port, and returns the first unread
character that can be accessed through that port. It does not actually
read that character or extract it from the port -- it just peeks at it to
see what it will be when and if it is (subsequently) read in.
If the value returned by (peek-char source) is the newline
character, #\newline, then we know that we're at the end of
the line and can proceed to average the numbers that we've encountered.
(This is also a good time to read in and discard the newline character, so
that we can start into the next line of numbers without encountering it
again.)
Like (read source) and (read-char source),
(peek-char source) returns the end-of-file object if there are
no more characters in the file.
Here's a procedure that manages the inner loop of the pseudocode shown
above. It reads in one line of numbers from the input port
source and writes their average to the output port.
(define average-line
(lambda (source target)
(let loop2 ((total 0) ; Initialize the running total
(tally 0) ; Initialize the tally.
(ch (peek-char source))) ; Peek at the next character.
(if (char=? ch #\newline) ; If it's a newline character,
(begin
(read-char source) ; discard it,
(write (/ total tally) target) ; compute the average and write
; it to the target file,
(newline target)) ; and terminate the line in the
; target file.
(let ((next (read source))) ; Otherwise, read a number.
(loop2 (+ total next) ; Add it to the running total.
(+ tally 1) ; Add 1 to the tally.
(peek-char source))))))) ; Peek at the next character
; and repeat the loop.
The Scheme implementation of the solution to problem #2 is now easy to
write: Open up the files, call average-line once for each
line of the input file, and finally close the files:
(define average-each-line
(lambda (source-file-name target-file-name)
(let ((source (open-input-file source-file-name))
; Open the input file.
(target (open-output-file target-file-name)))
; Open the output file.
(let loop1 ((ch (peek-char source))) ; Peek at the next character.
(if (eof-object? ch) ; If you get the eof-object,
(begin
(close-input-port source) ; close the input file
(close-output-port target)) ; and the output file.
(begin ; Otherwise,
(average-line source target) ; read the numbers on one
; line and compute and
; write their average.
(loop1 (peek-char source)))))))) ; Peek at the next character
; and repeat the loop.
Here's what a typical invocation of this procedure looks like:
> (average-each-line "/home/walker/151s/labs/file2.dat" "lab.output")Nothing shows up on screen, because the last operation performed by
average-each-line is the call to
close-output-port, which returns an unspecified value (and
Chez Scheme doesn't bother to print unspecified values). All the action
takes place off stage, in the files: If the input file contains three lines
of numbers -- say, for instance,
40 90 100 60 90 50 25 75 30 90 60 10 80 50 70 40 20-- then the program will create the output file
lab.output,
looking like this:
76 50 50but the creation of this file is invisible to the interactive Scheme user.
peek-char does and why it is
used here.
Line breaks in the input file should be ignored. In the output file, arrange for each integer to be printed on a line by itself.
This document is available on the World Wide Web as
http://www.math.grin.edu/~walker/courses/151.fa98/lab-file-intro.html