Laboratory Exercises For Computer Science 151

File Processing

File Processing

Goals: This laboratory exercise provides more practice working with files and introduces character-by-character processing within files.


Practice:

  1. Write and test a Scheme procedure that counts the number of integers in a given file. Your procedure should have one parameter, which is the name of the input file.
    1. Assume the file contains only integers.
    2. Assume the file contains any type of data, although any integers will appear with white space (e.g., tabs, spaces, new-line characters) on either side of them.

  2. Write and test a Scheme procedure that finds the maximum number in a given file. Your procedure should have one parameter, which is the name of the input file. While you may assume that the file contains at least one number, you should not assume that all numbers in the file are nonnegative.

    Test your program on file1.dat, file2.dat, and file3.dat in directory /home/walker/151s/labs .


Character-by-Character Processing:

The previous lab used the Scheme procedures read and write to read data from a file and to print data to another file. The procedures read-char and write-char perform the analogous operations for single characters. That is, read-char and write-char transfer a single character from or to the file.

These procedures are illustrated in the following Scheme procedure that copies an input file to an output file character by character:

(define copy-file
  (lambda (source-file-name target-file-name)
    (let ((source (open-input-file source-file-name))
          (target (open-output-file target-file-name)))
      (let loop ((ch (read-char source)))
        (if (eof-object? ch)
            (begin
              (close-input-port source)
              (close-output-port target))
            (begin
              (write-char ch target)
              (loop (read-char source))))))))


  1. Check that this procedure works as claimed by using it to copy the file /home/walker/151s/labs/file2.dat to a file named lab.data in your account.

  2. Modify this procedure so that every lower-case letter that is read in is converted to upper case before being written to the output file.

  3. Write a Scheme procedure tally-char that takes two arguments, the name of an input file and a character, and returns a tally of the number of occurrences of that character in the specified file.
    (tally-char "/home/walker/151s/labs/file1.dat" #\5) ===> 4
    (tally-char "/home/walker/151s/labs/file2.dat" #\0) ===> 16
    (tally-char "/home/walker/151s/labs/file2.dat" #\newline) ===> 3
    


Processing Data Fields

A common application requires reading various types of data from a file and performing some analysis. For example, consider the file /home/stone/courses/scheme/Iowa-cities.dat, which contains information about the sixty largest cities and towns in Iowa: their names and populations, as determined by the 1990 and 1980 censuses. For example, a portion of the file looks like this:

Fairfield         9768    9428
Fort Dodge       25894   29423
Fort Madison     11614   13520
Grinnell          8902    8868
Independence      5972    6392
Indianola        11340   10843
Iowa City        59735   50508
To clarify this format, the town name appears first on a line, followed by the population given by the 1990 census, and then by the 1980 census population.

Our task is to find the name of the Iowa city with the largest 1990 population (i.e, Des Moines).

General Approach: Our general approach to find the largest city might follow the same idea you used in problem 1 to find the maximum number in a file. Overall, you probably followed something like the following outline:

  1. Open the input file
  2. Read the first number -- this is the current maximum
  3. Until the end of the file is reached:
    1. Read the next number
    2. If the new number is larger than the past maximum
      1. update the new maximum
  4. Print the maximum value found
  5. Close the input file

The approach for finding the largest city is similar, although now the program must read the city name, the 1990 population, and the 1980 population for each city. (While the 1980 population is not needed, we still will need to read it, so we can get to the next data in the file.)


  1. Update the above outline, to include the reading of the city and the two populations for each line of the file, and revise the step for finding the maximum.


From previous work, we already know how to read numbers. The main work remaining is to read the city names. For this task, our work is aided by the uniform format of the file. In particular, the file is organized so that columns 1 through 16 contain the name of the town, left-justified; columns 17 through 22 contain the 1990 population, right-justified; columns 23 and 24 are always spaces; and columns 25 through 30 contain the 1980 population, right-justified.

With this uniform format, we could read a city name by reading 16 characters from a line and forming the characters into a string. This will allow simple reading of a city, and we can use the normal read procedure to read the other numbers in a file.


  1. Write a procedure
    
    (define read-name
       (lambda (source number-chars)
         ... this is your part
       )
    )
    
    
    which reads the given number of characters from the already-opened source file, and returns those characters placed together as a string. That is, read-name should assume that source has already been opened by open-input-file, and that file should remain open after read-name has read the specified the number-chars to read.

    To test this program, you might utilize the following simple test procedure:

    
    (define test-read-name
       (lambda (file-name)
          (let ((source (open-input-file file-name)))
             (display "First 10 characters: ")
             (display (read-name source 10))
             (newline)
             (display "Next 16 characters: ")
             (display (read-name source 16))
             (newline)
             (display "Next 8 characters: ")
             (display (read-name source 8))
             (newline)
             (display "Next 40 characters: ")
             (display (read-name source 40))
             (newline)
             (close-input-port source)
          )
      )
    )
    
    
    File "/home/walker/151s/labs/file4.dat" contains one line, with the characters abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ. Thus, the call
    (test-read-name "/home/walker/151s/labs/file4.dat")
    should return the following:
    
    First 10 characters: abcdefghij
    Next 16 characters: klmnopqrstuvwxyz
    Next 8 characters: ABCDEFGH
    Next 40 characters: IJKLMNOPQRSTUVWXYZ
    
    >
    
    
    In this output, note that the last line reads the final newline character -- when this is printed, a blank line occurs before the subsequent prompt. Also, the last line printed has less than the full 40 characters, as the end of the file was encountered.

  2. Use your read-name procedure to implement the outline above to solve the maximum city problem. That is, write a procedure max-city which has a file name as a parameter and which returns the name of the city in the file with the largest 1990 population.


This document is available on the World Wide Web as

http://www.math.grin.edu/~walker/courses/151.fa98/lab-file-examples.html

created March 11, 1997 by John D. Stone
last revised October 8, 1998 by Henry M. Walker