Assignment 8: Run-Length Encoding

Due: 9:00 a.m., Wednesday, 9 April 2008

Summary: In this assignment, you will explore techniques for efficiently (or not so efficiently) writing images to files.

Purposes: To give you more experience with input and output. To help you think more about tradeoffs between speed, size, and clarity.

Expected Time: Two to three hours.

Collaboration: We encourage you to work in groups of size three. You may, however, work alone or work in a group of size two or size four. You may discuss this assignment with anyone, provided you credit such discussions when you submit the assignment.

Submitting: Email your answer to <rebelsky@grinnell.edu>. The title of your email should have the form CSC151.01 2008S Assignment 8: Run-Length Encoding and should contain your answers to all parts of the assignment. Scheme code should be in the body of the message.

Warning: So that this assignment is a learning experience for everyone, we may spend class time publicly critiquing your work.

Preliminaries

Over the past few classes, we have been exploring how one stores information about an image in in a file so that the image can be restored. As we've noted, there are a number of criteria one considers in designing a file format, including the file size for images, the accuracy of the representation, the computational cost of saving and restoring image files, the difficulty of writing the algorithms to save and restore files, and even the human readability of these files. So far, we have focused on a set of techniques in which we write one value for each pixel in the image, which means that our primary focus was on how little space one uses per pixels.

However, if our concern is file size, we can achieve some improvement by storing sequences of pixels, rather than individual pixels. In this technique, which is called run-length encoding, when you have a sequence of pixels of the same color, you write one entry for all the pixels, rather than a separate entry for each pixel. Suppose the first five pixels in an image are purple. We would write the color purple and then the number 5. When reading the image back from a file, we read in color and number of repetitions, fill in that many pixels, and then go on to the next color/count pair.

For example, consider a mostly-black 5x5 image with a single white pixel in the center. This image has 12 black pixels in sequence (five in the first row, five in the second row, two in the third row), one white pixel, and then another 12 black pixels (two more in the third row, five in the forth row, and five in the fifth row). We might represent this image as

5 5
0 0 0 12
255 255 255 1
0 0 0 12

Of course, nothing (other than our desire to produce the smallest file possible) says that we have to continue pixels from one row to the next. Hence, we could also represent the image as

5 5
0 0 0 5
0 0 0 5
0 0 0 2
255 255 255 1
0 0 0 2
0 0 0 5
0 0 0 5

In fact, we could even represent each pixel separately (which gives us a larger file than we would have otherwise).

5 5
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
255 255 255 1
i0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1

How can we read such a file back into an image? We might use a procedure like the following.

;;; Procedure:
;;;   image-run-length-decode
;;; Parameters:
;;;   filename, a string
;;; Purpose:
;;;   Read a run-length-encoded image from the given file.
;;; Produces:
;;;   image, an image
;;; Preconditions:
;;;   filename names a valid file.
;;;   The file has the following form:
;;;     (1) an integer that specifies the width of the image
;;;     (2) an integer that specifies the height of the image
;;;     (3) sequence of (a color followed by an integer)
;;; Postconditions:
;;;   The image corresponds to the description in the file.
(define image-run-length-decode
  (lambda (filename)
    (let* ((inport (open-input-file filename))
           (width (read inport))
           (height (read inport))
           (image (image-new width height)))
      (let kernel ((row 0)
                   (col 0)
                   (color (rgb-read inport))
                   (count (read inport)))
        (cond
          ; No positions left?  We're done.
          ((>= row height)
           (close-input-port inport)
           image)
          ; No colors left?  Read another one.
          ((zero? count)
           (kernel row col (rgb-read inport) (read inport)))
          ; Otherwise, set the color in the current position and move
          ; on to the next.
          (else
           (image-set-pixel! image col row color)
           (if (< (+ col 1) width)
               (kernel row (+ col 1) color (- count 1))
               (kernel (+ row 1) 0 color (- count 1)))))))))

(define rgb-read
  (lambda (port)
    (let* ((red (read port))
           (green (read port))
           (blue (read port)))
       (if (eof-object? blue) 
           blue
           (rgb-new red green blue)))))

Assignment

Write (image-run-length-encode image filename), which writes an image to the specified file using run-length encoding.

You can use the following strategy. To write the image, we iterate through both columns and rows (see image-run-length-decode for some ideas). While writing, we keep track of the most recent color encountered and the number of pixels we've already seen with that color. We then look at the next pixel, whether or not it's on the same row. If the next pixel is the same color, we increment the count for that color and go on. If the next pixel is a different color, we write the previous color and count, and continue with the new color and a count of 1. When we run out of pixels to write, we write the final color and count.

Important hint: The obvious algorithm is to compare the color of each pixel to the color of the next pixel to the right, but this results in a lot of special cases for when you reach the end of a row or the end of the image. It gets complicated very quickly. Instead, think about comparing the color of the current pixel to the previous color, as suggested above. When using this strategy, think carefully about the initial values for the column, row, color, and count. You can assume the image contains at least one pixel.

A Few Notes and Suggestions

Before beginning to write image-run-length-encode, you should make sure that you understand both the image file format and the instructions in image-run-length-decode. Try reading in the sample images. Modify a few lines and see the effect. Write a few files by hand and read them in.

You should also make sure that you understand how pixel maps work. If not, go over the labs on pixel maps again.

Important Evaluation Criteria

Our first criterion in evaluating your assignment will be your success in writing a file that can be read back into the image using the algorithm above (even if the file is not as small as it could be). Our second criterion will be file size: Are you able to write the smallest possible file. Our third criterion will be elegance: Is your code clear and concise.

Note: While we would probably use run-length encoding with the most efficient color representation, for the purposes of this assignment it is fine to use one of the human-readable representations. That is, you can choose whichever implementations of rgb-write, rgb-read, integer-write, and integer-read that you wish. Please include those implementations with your assignment.