Files and ports

Input ports

When a standalone Scheme program is designed to work with large volumes of data, it is often more convenient for the user to prepare its input in one or more separate files, using an appropriate tool (such as a text editor or a statistical package), than to type the data in as the program is running. The Scheme program itself finds the files containing the data and reads them, without user intervention.

To provide for this possibility, each of Scheme's input procedures can be provided with an extra argument that specifies the input port through which the data will be read in. In theory, any kind of a device that supplies data on demand can be on the other side of the input port, and some implementations of Scheme provide several ways of creating them. However, we'll consider only the default input port, through which data typed at the keyboard are transmitted to a Chez Scheme program interactively, and file input ports, through which Chez Scheme programs read data stored in files.

When the Chez Scheme interactive interface is started, it automatically creates the default input port and connects the keyboard to it. This is the input port on which the read, read-char, and peek-char procedures normally operate. When the user exits from Chez Scheme, this port is closed as part of the cleanup process.

To read data from a file, however, the programmer must explicitly open an input port and connect that file to it. There is a built-in Scheme procedure to do this: open-input-file takes one argument, a string, and returns an input port to which the file named by the string is connected. For instance, the call (open-input-file "/home/stone/courses/scheme/sample.dat") returns an input port to which the file /home/stone/courses/scheme/sample.dat is connected.

Constructing the input port does you no good unless you give it a name, so open-input-file is almost always invoked within some binding construction, such as a definition or a let-expression:

> (define source (open-input-file "/home/stone/courses/scheme/sample.dat"))

The sample.dat file is a text file that contains one line, consisting of the cheerful greeting Hi!. One can now access the contents of the file by calling Scheme's build-in input procedures, but giving them the input port source as an argument:

> (read-char source)
#\H
> (peek-char source)
#\i
> (peek-char source)
#\i
> (read-char source)
#\i
> (read-char source)
#\!
> (read-char source)
#\newline

Notice that the peek-char procedure peeks through the port to see what the next available character of the file will be, and returns the character it sees. The read-char procedure pulls that character in through the port and returns it, leaving the port open with the following character accessible through it.

It would also be possible to use the read procedure, which pulls in a complete Scheme datum instead of just one character. It too leaves the port open, with the next character accessible through it.

Scheme automatically provides a sentinel for every file input port it opens. The sentinel is a special value known as the end-of-file object. It is returned by any of the three input procedures when there is nothing left to be read from the file. Chez Scheme prints the end-of-file object as #!eof:

> (peek-char source)
#!eof
> (read-char source)
#!eof
> (read-char source)
#!eof

There is no standard Scheme name for the end-of-file object, but there is a built-in predicate eof-object? that detects it:

> (eof-object? (read source))
#t

When all of the data have been read from a file, the programmer must explicitly close the input port by invoking the close-input-port procedure, giving it the input port as an argument. Close-input-port is invoked only for its side effect.

Here's an example of how to use these facilities. The sum-of-file procedure takes one argument, a string that names a file full of numbers; the procedure opens that file, reads in the numbers it contains one by one, adds each one in turn to a running total, closes the file, and returns the total.

(define sum-of-file
  (lambda (file-name)
    (let ((source (open-input-file file-name)))
      (let kernel ((total 0)
                   (next-number (read source)))
        (if (eof-object? next-number)
            (begin
              (close-input-port source)
              total)
            (kernel (+ total next-number) (read source)))))))

Notice that the running total is initialized to 0 when the kernel procedure is first started and that each of the values returned by the read procedure is checked to make sure that it is not the end-of-file object before being added to the total. When the end-of-file object is finally reached, the sequencing specified by begin-expression ensures that the input port will be closed before the total is returned.


Exercise 1

Using XEmacs, create a file that contains nothing but a few numbers (separated by whitespace characters -- spaces and newlines). Save this file, then use sum-of-file to determine the sum of the numbers in the file. Check the answer.


Exercise 2

Using sum-of-file as a pattern, write a Scheme procedure file-size that takes as argument a string that names a file and returns the number of characters in that file (i.e., the number of times that read-char can be called to read a character from the file without returning the end-of-file object).


Exercise 3

Find out what happens if sum-of-file or file-size is given a string that does not name any existing file.


Output ports

Similarly, when a Scheme program generates a lot of output, it is often more convenient to have it store the output in one or more files, instead of displaying it in the window that the interactive interface is using. Other programs can recover the results from such files if further processing is needed.

To provide for this possibility, each of Scheme's output procedures can be provided with an extra argument that specifies the output port through which the data will be written. As before, we'll consider only the default output port -- the Chez Scheme interaction window -- and file output ports, through which Chez Scheme programs writes data to files.

If you followed the discussion of input ports, there are few surprises about output ports. The default output port is created when the Chez Scheme interactive interface starts up and closed when it shuts down; in between, Chez Scheme uses this port for most calls to write, display, newline, and write-char. To write data to a file instead, the programmer must explicitly invoke open-output-file, which returns a file output port; once this output port is given a name, it can be used as an extra argument to any of the output procedures, with the effect that the values will be written to the file rather than to the interaction window. When no more output is to be written to the file, the programmer must explicitly close the port by invoking close-output-port.

As an example, here's a procedure that takes two arguments -- the first a string that names the output file to be created, the second a positive integer -- and writes the exact divisors of the positive integer into the specified output file:

(define store-divisors
  (lambda (file-name dividend)
    (let ((target (open-output-file file-name)))
      (let kernel ((trial-divisor 1))
        (if (< dividend trial-divisor)
            (close-output-port target)
            (begin
              (if (zero? (remainder dividend trial-divisor))
                  (begin
                    (write trial-divisor target)
                    (newline target)))
              (kernel (+ trial-divisor 1))))))))

Exercise 4

Use the store-divisors procedure to draw up a list of the divisors of 120, storing them in a file named divisors-of-120. Examine the file afterwards and confirm that the answer is correct. (Don't give this procedure an extremely large number as argument -- it's too slow. There are far more efficient ways to find divisors!)


Exercise 5

The Scheme standard says that if you try to open an output port to a file that already exists, ``the effect is unspecified,'' i.e., anything might happen. Find out what Chez Scheme does in this situation. (If you don't like Chez Scheme's policy, page 107 of the Chez Scheme system manual explains how to get different behavior.)

Incidentally, to enable the programmer to test the precondition for open-output-file, Chez Scheme supplies a file-exists? predicate, which takes a string as argument and returns #t if it is the name of an existing file and #f if it is not. It also supplies a delete-file procedure that takes a string as argument and tries to annihilate the file that it names (if there is such a file). Both of these procedures are non-standard, however; other Scheme implementations do not always provide them.


Exercise 6

Two positive integers are said to be relatively prime if they have no common divisors other than 1 -- in Scheme: (= (gcd first second) 1). With store-divisors as a model, write a Scheme procedure store-relative-primes that takes two arguments, the first a string that names the output file to be created and the second a positive integer, and writes into the specified output file every positive integer that is less than the specified positive integer and relatively prime to it.


Miscellaneous facilities

Predictably, Scheme provides the type predicate input-port?, which can be applied to any object to determine whether it is an input port, and the analogous predicate output-port?.

The current-input-port procedure, which takes no arguments, returns the default input port, in case you want to give it a name, pass it as an argument to a procedure that expects a port, and so on. Similarly, the current-output-port procedure takes no arguments and returns the default output port.

Chez Scheme ignores attempts to close the default ports. However, not all implementations of Scheme are so tolerant, so it's usually bad style to rely on this behavior.


This document is available on the World Wide Web as

http://www.math.grin.edu/courses/Scheme/spring-1998/files.html

created October 23, 1997
last revised June 21, 1998

John David Stone (stone@math.grin.edu)