Files
Summary:
Files permit you to save values between invocations of programs and
to provide information to programs without typing the information
interactively. In this reading, we explore key ideas in the use of
files within Scheme.
Introduction
When a Scheme program is designed to work with large volumes of data,
it is often more convenient for the user to prepare its input in one or
more separate files, using an appropriate tool (such as a text editor
or a statistical package), than to type the data in as the program
is running. The Scheme program itself finds the files containing the
data and reads them, without user intervention.
Similarly, when a Scheme program generates a lot of output, it is
often more convenient to have it store the output in one or more files,
instead of displaying it in the window that the interactive interface
is using. Other programs can recover the results from such files if
further processing is needed.
Note that files let us store values between invocation of
Scheme programs (and other programs). This permanence is another
benefit of using files.
In this reading, we consider the techniques used in Scheme to read
data from files and to write data to files.
Reading One Character at a Time
As the example above suggests, an input port is often used as an
argument to read-char, which reads in (and returns)
one character from the file on the other side of the input port. It
can also be used as parameter to peek-char, which looks
through the input port to see what the next character in the file
is, and returns that character, but does not actually read it in from
the file. The difference is that you can peek at the next character as
often as you like, and it remains accessible through the input port, but
once you read in a character there is no way to un-read
it -- the port advances inexorably to the next character in the file.
The file /home/rebelsky/glimmer/samples/hi.txt text file
that contains one line, consisting of the cheerful greeting Hi
there!. Let us see what happens when we read from this file
using read-char.
> (define source
(open-input-file "/home/rebelsky/glimmer/samples/hi.txt"))
> (read-char source)
#\H
> (peek-char source)
#\i
> (peek-char source)
#\i
> (read-char source)
#\i
> (read-char source)
#\space
> (close-input-port source)
Notice that the peek-char procedure peeks through
the port to see what the next available character of the file is, and
returns the character it sees. The read-char
procedure pulls that character in through the port and returns it,
leaving the port open with the following character accessible through
it.
Finding the End of a File
Scheme automatically provides a sentinel for every file input
port it opens. The sentinel is a special value known as the
end-of-file object. It is returned by any
of the input procedures when there is nothing left to be
read from the file. MediaScript's default Scheme interpreter
prints the end-of-file object as
&eof;. To continue the preceding example,
> (define source
(open-input-file "/home/rebelsky/Web/Courses/CS151/2007S/Examples/hi.txt"))
> (read-char source)
#\H
> (read-char source)
#\i
> (read-char source)
#\space
> (read-char source)
#\T
> (read-char source)
#\h
> (read-char source)
#\e
> (read-char source)
#\r
> (read-char source)
#\e
> (read-char source)
#\!
> (read-char source)
#\newline
> (peek-char source)
&eof;
> (read-char source)
&eof;
> (read-char source)
&eof;
> (close-input-port source)
The end-of-file object is not a character, and there is no standard Scheme
name for the end-of-file object, but there is a primitive predicate
eof-object? that detects it:
> (eof-object? (read-char source))
#t
Reading One Line From a File
As an example of the use of read-char, here's the
definition of a procedure called read-line, which
reads in characters through a given input port until it reaches the
end of the file or encounters a #\newline character, then
returns a string containing all of the characters that it has read in:
There are many things we can now do with these procedures. For example,
here's a simple procedure that takes a file name as an argument and
prints the first line of a file.
Note that read-line provides an instance of
file recursion. That is, we are using recursion
(having a procedure calling itself) but using attributes of the file
to determine when we've reached the base case. Finding the end of the
line is one typical base case. Another is the end of the file.
The read Procedure
It is also possible to read from a file using the one-argument form of
the read procedure, which pulls a complete Scheme
datum (instead of just one character) through a given input port.
It also leaves the port open, with the next character or Scheme datum
accessible through it.
Consider, again, the file described above with the form
23512 11 13
If we were to work with this file using read-char,
we would see a sequence of values like the following
> (define source
(open-input-file "/home/rebelsky/glimmer/samples/sample.txt"))
> (read-char source)
#\2
> (read-char source)
#\3
> (read-char source)
#\5
> (read-char source)
#\1
> (read-char source)
#\2
> (read-char source)
#\space
> (read-char source)
#\1
> (read-char source)
#\1
> (read-char source)
#\space
> (read-char source)
#\1
> (read-char source)
#\3
> (read-char source)
#\newline
> (read-char source)
&eof;
> (close-input-port source)
If, however, we were to use read, we would see
the following sequence.
> (define source
(open-input-file "/home/rebelsky/glimmer/samples/sample.txt"))
> (read source)
23512
> (read source)
11
> (read source)
13
> (read source)
&eof;
> (close-input-port source)
Whether you use read or
read-char depends on your particular application.
Summing Files: Another Form of File Recursion
Here's another example of how to use Scheme's facilities for input
from a file. The sum-of-file procedure takes one
argument, a string that names a file full of numbers; the procedure
opens that file, reads in the numbers it contains one by one, adds each
one in turn to a running total, closes the file, and returns the total.
In the base case of the recursion, there are no numbers left in the
file, and the call to the read procedure immediately
returns the end-of-file object. The helper closes the file and
returns 0.
If the value of (read source) is a number, it is added
to the value of a recursive call to the helper, which is the sum of
all the subsequent numbers in the file.
If the helper discovers a non-number in the file whose contents it
is adding up, then we skip it. (We might also consider throwing
an error, but then we'll also need to worry about cleaning up after
ourselves, so skipping is the easiest strategy at this point.)
Writing Data
Scheme provides four basic output operations:
write, display,
newline, and write-char.
We'll start with the first three, and then turn to
write-char a bit later.
The write procedure can take one or two arguments.
If given one argument, that argument is the value to be written.
If given two arguments, the first argument is the value to be
written and the second is the port to which to write the value.
In each case, it prints out a representation of the value. This value
is either printed to the screen (one argument) or the file that
corresponds to the port (two arguments). The nature of the value that
write returns is unspecified.
That is, the printing is a side effect of the evaluation of the call
to write, not its result.
> (write 23)
23> (write #\a)
#\a> (write "hello world")
"hello world"> (write (list 23 #\a "hello" null))
(23 #\a "hello" ())
Why are the values immediately followed by the prompt, rather than
having the prompt on a subsequent line? Because Scheme wants to
permit you to write more than one value on a line. Hence, you need
to explicitly tell it when to move to another line. You do so with
the newline procedure. This procedure takes
either zero or one parameters. In the first case, it prints a
carriage return to the screen. In the second, it prints a carriage
return to the given file.
> (write "hello") (newline)
"hello"
> (write 23) (newline)
23
> (write "hello") (write "goodbye") (write 23) (newline)
"hello""goodbye"23
As the preceding suggests, the values written by write
seem more designed for the computer than the human. What if we don't want
the quotation marks, hash marks, and the ilk? Fortunately, Scheme
provides a similar procedure, display, that
displays its argument in a more human-readable form.
> (display 23) (newline)
23
> (display #\a) (newline)
a
> (display "hello") (newline)
hello
> (display (list 23 #\a "hello" null)) (newline)
(23 a hello ())
> (display "hello") (display #\a) (display "goodbye") (newline)
helloagoodbye
Creating New Files
To provide for the possibility of having Scheme create files and write
data to those files, each of Scheme's output procedures can be provided
with a parameter that specifies the output port through which the data
will be written. As before, we'll consider only the default output
port -- the interaction box, under DrScheme -- and file output ports,
through which Scheme programs write data to files.
If you followed the discussion of input ports, you should encounter
few surprises about output ports. The default output port is created
when the Scheme interactive interface starts up and closed when
it shuts down; in between, Scheme uses this port for most calls
to write, display, and
newline. To write data to a file instead, the
programmer must explicitly invoke open-output-file,
which returns a file output port; once this output port is given a name,
it can be used as an extra argument to any of the output procedures,
with the effect that the values will be written to the file rather
than to the interaction window. When no more output is to be written
to the file, the programmer must explicitly close the port by invoking
close-output-port.
As an example, here's a procedure that takes two arguments -- the
first a string that names the output file to be created, the second
a positive integer -- and writes the exact divisors of the positive
integer into the specified output file:
What should happen if open-output-file is called
using an existing file? It is actually up to the implementer. Some
implementations refuse to overwrite a file and throw an error,
making life difficult for those who expect to be able to reuse
the file name, particularly during testing. Other implementations
blithely go about their business, potentially overwriting important
data. The GIMP's default Scheme (which we don't use), takes the
second approach. The Scheme we use by default in MediaScript takes
the first approach. Fortunately, both implementations supply a
file-exists? predicate, which takes a string as
a parameter and determines whether a corresponding file exists.
If you can't overwrite an existing file, the language should provide
some support for getting rid of those files, so that programmers can reuse
file names when they want to. The default Scheme implementation in
MediaScript provides a delete-file procedure to
do just that.
Neither file-exists? nor
delete-file is a standard procedure. Hence,
when you start using a new version of Scheme, and need to use files,
one of the first thing you must do is check the documentation to see
what additional file operations it supports.
Writing One Character At A Time
Besides write, display, and
newline, Scheme provides a primitive procedure
write-char that is used to create an output file
one character at a time. It takes two arguments, the character to be
written and the output port through which it is to be sent.
Miscellaneous Facilities
Scheme provides the type predicate input-port?,
which can be applied to any object to determine whether it is an input
port. It also provides the analogous output-port?
predicate.
Short Reference
(close-input-port input-port)
Close an open input port.
(close-output-port output-port)
Close an open output port.
(display value)
Print a representation of the value on the screen.
(display value output-port)
Print a representation of the value to the specified port.
(eof-object? value)
Determine if the given value is something returned by
read to indicate the end of the file.
(file-exists? filename)
Determine whether the specified file exists.
(input-port? value)
Determine whether the given value is an input port.
(newline)
Write a newline to the screen.
(newline)
Write a newline to the screen.
(newline port)
Write a newline to the specified port.
(open-input-file filename)
Open the specified file for reading. Returns an input port.
(open-output-file filename)
Open the specified file for writing. Returns an output port.
(output-port? value)
Determine whether the given value is an output port.
(peek-char input-port)
Determine the next character available on the specified port.
(read input-port)
Read the next value available on the specified port.
(read-char input-port)
Read the next character available on the specified port.
(write value)
Print the verbose representation of the value on the screen.
(write value output-port)
Print the verbose representation of the value to the specified port.