Testing ProceduresSummary: As you develop procedures and
collections of procedures, you have a responsibility to make sure that
they work correctly. One mechanism for checking your procedures is a
comprehensive suite of tests. In this reading, we consider the design
and use of tests.
Introduction
Most computer programmers strive to write clear, efficient, and correct
code. It is (usually) easy to determine whether code is clear. With
some practice and knowledge of the correct tools, one can determine how
efficient code is. However, believe it or not,
it is often difficult to determine whether code is correct.
The gold standard of correctness is a formal proof that the
procedure or program is correct. However, in order to prove a program or
procedure correct, one must develop a rich mathematical toolkit and devote
significant effort to writing the proof. Such effort is worth it for
life-critical applications, but for many programs, it is often more than
can be reasonably expected.
There is also a disadvantage of formal proof: Code often changes and the
proof must therefore also change. Why does code change? At times, the
requirements of the code change (e.g., a procedure that was to do three
related things is now expected to do four related things). At other times,
with experience, programmers realize that they can improve the code by
making a few changes. If we require that all code be proven correct, and
if changing code means rewriting the proof, then we discourage programmers
from changing their code.
Hence, we need other ways to have some confidence that our code is
correct. A typical mechanism is a test suite, a collection of tests that
are unlikely to all succeed if the code being tested is erroneous. One
nice aspect of a test suite is that when you make changes, you can
simply re-run the test suite and see if all the tests succeed. To many,
test suites encourage programmers to experiment with improving their
code, since good suites will tell them immediately whether or not the
changes they have made are successful.
But when and how do you develop tests? These questions are the subject of
this reading.
What is a Test?
As the introduction suggested, you should write tests when you write
code. But what is a test? Put simply, a test is a bit of code that
reveals something about the correctness of a procedure or a set of
procedures. Most typically, we express tests in terms of expressions
and their expected values. For example, if we've written a procedure,
list-reverse, that reverses lists, we would
expect that
(list-reverse null) is null(list-reverse (list 3)) is equal to (list 3)(list-reverse (list 3 7 11)) is equal to (list 11 7
3)
We could express those expectations in a variety of ways. The simplest
strategy is to execute each expression, in turn, and see if the result
is what we expected. You should have be using this form of testing regularly
in your coding.
Of course, one disadvantage of this kind of testing is that you have to
manually look at the results to make sure that they are correct. You also
have to know what the correct answers should be. But reading isn't
always a good strategy. There's
some evidence that you don't always catch errors when you have to
do this comparison, particularly when you have a lot of tests. I
know that I've certainly missed a number of errors this way. An appendix to this document
presents an interesting historical anecdote about the dangers of
writing a test suite in which you must manually read all of the results.
Since reading the results is tedious and perhaps even dangerous,
it is often useful
to have the computer do the comparison for you. For example, we might
write a procedure, check,
that checks to make sure that two expressions are equal.
We can then use this procedure for the tests above, as follows.
Note that in the last test, the test itself, rather than
list-reverse, was incorrect.
Now, confirming that our code is correct is now simply a matter of
scanning through the results and seeing if any say "FAILED".
And, as importantly, we've specified the expected result along with each
expression, so we don't need to look it up manually.
Of course, there are still some disadvantages with this strategy. For
example, if we put the tests in a file to execute one by one, it may be
difficult to tell which ones failed. Also, for a large set of tests, it
seems a bit excessive to print OK every time. Finally, we get neither
OK
nor FAILED when there's an error in the original expression.
In fact, if an error occurs in the middle of a group of tests, the whole
thing may come to a screeching halt.
Testing Environments
To handle all of these additional issues, many program development
environments now include some form of testing environment, in which you
specify a sequence of tests and receive summaries of the results
of the tests. For example, a testing environment, given the four tests
for the previous section, might report something like:
Different testing environments are designed differently. For this class,
we have developed a simple testing environment that emphasizes testing
for expected outcomes. That environment is loaded automatically into
MediaScript.
To use that environment, you begin a series of tests with the
begin-tests! procedure. That procedure takes no
parameters. When you are done with a sequence of tests, you invoke the
end-tests! procedure. That procedure reports on
the results of the tests.
(begin-tests!)
...
(end-tests!)
The tests themselves might seem a bit strange at first glance. For
each test, you call the test! procedure
with two parameters, (1) the expression you want to evaluate
and (2) an expression that gives the value you expect. For
example, we would express the first three (correct) tests of
list-reverse as follows.
(test! (list-reverse null) null)
(test! (list-reverse (list 3)) (list 3))
(test! (list-reverse (list 3 7 11)) (list 11 7 3))
We might also want to test whether an expression causes an error. For
example, we would expect (list-reverse 5) to give
an error because 5 is a number and not a list. The
test-error! procedure lets us check whether an
expression we expect to give an error actually gives that error.
(test-error! (list-reverse 5))
When to Write Tests
To many programmers, testing is much like documentation. That is, it's
something you add after you've written the majority of the code.
However, testing, like documentation, can make it much easier to write
the code in the first place.
As we suggested in the reading
on documentation, by writing documentation first, you develop
a clearer sense of what you want your procedures to accomplish. Taking
the time to write documentation can also help you think through the
special cases. For some programmers, writing the formal postconditions
can give them an idea of how to solve the problem.
If you design your tests first, you can accomplish similar goals. For
example, if you think carefully about what tests might fail, you make
sure the special cases are handled. Also, a good set of tests of the
form this expression should have this value can serve
as a form of documentation for the reader, explaining through example
what the procedure is to do. There is even a popular style of software
engineering, called test-driven development,
in which you always write the tests first. (Test-driven development
is a key part of Extreme Programming and a variety of so-called
agile software development strategies.)
An Example: Testing tally-value
Let's consider this test-first strategy as we attempt to
write a common procedure, (tally-valuevallst), which
counts the number of times that val appears in
lst. What are some good tests?
We should make sure that tally-value returns
0 for the empty list.
We should make sure that tally-value
returns 1 for a singleton list in which the singleton element is
val.
We should make sure that tally-value returns
0 for a singleton list in which the singleton element is not
val.
We should make sure that tally-value returns 1
for length-three lists in which val is just
the first, just the last, or just the middle element.
We should make sure that tally-value returns
the appropriate count for lists that include multiple copies of
val.
We should make sure that tally-value returns 0 for
longer lists that do not include val.
We might try different types of values.
You can probably fill in some more of your own.
We will create two files, tally-value.scm,
which will contain the code and documentation
for tally-value, and
tally-value-test.scm, which will
contain our testing code. After making an empty
tally-value.scm, we create
tally-value-test.scm.
Of course, we haven't defined tally-value yet, so we
don't expect the tests to work, but let's see what happens.
Yeah, given that we didn't bother to define
tally-value, we'd expect some errors. So, let's
start to define tally-value. Here's an incorrect
definition:
Amazingly, this version passes several of our tests. When we run the
test suite, we get the following output.
But the goal is not to pass some tests. The goal
is to pass all of the tests. We clearly need to fix the code.
It turns out that thinking about the kinds of errors we got can help us
repair our code.
That first error is important. It indicates that we're taking the
car of an empty list. Hence, we need to test
for that case. What about the others? Most look like a case of us
forgetting to recurse. Let's try again:
What does our test suite say?
Wow! That's comforting. We seem to have written it correctly (or at
least correctly enough to pass our tests). However, some people might
find the formulation above confusing or inelegant, so we might try to
rewrite it. Instead of putting the addition within the
if, we'll have
a three-way cond.
What do the tests say?
Ah! We've made a common mistake of reorganizing code: We've forgotten to
make recursive calls in all the places where they are needed.
We cross our fingers and run the tests again.
Now, we've left out some tests. What if val
is a string, a spot, or even a list? We'll add a few more tests to
tally-value-test.scm to cover these cases. We
don't need as many tests for each, since we already know that the
overall structure works.
As you'll see in the laboratory, these tests will reveal fascinating new
errors in the code. The lab will give you the opportunity to correct the
errors.
Appendix: An Historical Tale
Many of us are reminded of the need for unit testing by the following
story by Doug McIlroy, posted to The Risks Digest: Forum on
Risks to the Public in Computers and Related Systems:
Sometime around 1961, a customer of the Bell Labs computing center
questioned a value returned by the sine routine. The cause was simple:
a card had dropped out of the assembly language source. Bob Morris
pinned down the exact date by checking the dutifully filed reversions
tests for system builds. Each time test values of the sine routine
(and the rest of the library) had been printed out. Essentially the
acceptance criterion was that the printout was the right thickness;
the important point was that the tests ran to conclusion, not that
they gave right answers. The trouble persisted through several
deployed generations of the system.
McIlroy, Doug (2006). Trig routine risk: An Oldie.
Risks Digest 24(49), December 2006.
If, instead of a thick printout, Bell Labs had arranged for a count of
successes and a list of failures, they (and their customers) would have
have been in much better shape.