In this reading, we explore issues related to trees, including some terminology for talking and writing about trees, some formal definitions of trees, some techniques for recursion over trees (using what is often called deep recursion), and some applications.
As you know, a list is a collection of values in which each value has at most one successor. (Every value but the last value has one successor; the last value has no successor.) While lists are useful for a wide variety of situations, there are situations in which they do not suffice. One is the problem of modeling a hierarchy. For example, the president of a company may have two vice presidents who report to him, her, or zir; each vice president may have assistant vice presidents who report to them; and so on and so forth.
We use the term tree for structures in which each value may have multiple successors (or underlings, as in the example above). When each value may have at most two successors, we call them binary trees. In this reading, we will focus primarily on binary trees. Here’s a sample binary tree written in the way that many computer scientists make them.
There are a number of important characteristics we associate with binary trees, including
At any point in the binary tree built we usually refer to the two trees below it as the left subtree and the right subtree. We refer to the equivalent of a pair as a node. When both of the subtrees of a node are empty, we call that node a leaf.
In the sample tree, the root value is “John”, its left subtree is rooted with “Henry” and its right subtree is rooted with “Sam”. The size of the tree is nine because there are nine values in the tree. The depth of the tree is five for the John -> Sam -> Marge -> PM -> Rhys path. The leaves are Charlie, Janet, and Rhys.
We have an informal definition of trees, given by the figure above. Can we define them formally? Yes. Just as we can define lists recursively, so can we define trees recursively. As you may recall, the definition of a list goes something like the following:
lstis a list and
valis any Scheme value, then
(cons val lst)is a list.
Why do we call this a “recursive definition”? Because the definition of a list is in terms of lists (at least in the middle item). We can write a similarly recursive definition for trees. We’ll write one for binary trees, in which each value has two successors, rather than the one successor in lists.
emptyis a binary tree.
t2are binary trees, and
valis any Scheme value, then
(node val t1 t2)is a binary tree.
If all of the values have the same type, then we can refer to the tree in terms of the type name. For example, if all of the values are numbers, we would call it a “number tree”.
The values at the bottom of the tree we refer to as
empty, which plays
the same role that
null plays in lists. We can check whether a value
is that value with the predicate
As the description above suggests, we can build the tree with
which plays the same role in binary trees that
cons plays in lists. We
can check if a value is a node with
In lists, we get the value with
car and the rest of the list with
cdr. In binary search trees, we will use the more sensibly named
Computer scientists have found a wide variety of applications of binary trees and more general trees. As we suggested at the beginning of the reading, a general tree can provide a way to represent the hierarchy of positions at the organization. At the top of the tree we have the President or CEO. Beneath the President are the Vice Presidents. Beneath each Vice President are Deans or Directors. And so on or so forth.
However, there’s little you can do with a tree representing a hierarchy than try to find where someone belongs in the hierarchy. Hence, we use trees for a variety of other applications.
As its name suggests, a decision tree represents a series of decisions, such as yes/no questions. We might have the left branch correspond to a “yes” answer and the right branch correspond to a “no” answer. Here is a text-based decision tree to help students select a course.
Have you taken a course in computer science? N: Are you interested in learning to program? N: We recommend that you take CSC 105, in which you consider social issues in computing. Y: We recommend that you take CSC 151, in which you consider both programming issues and more general social issues. Y: Did you take CSC 151 or an equivalent course? N: Did you enjoy the course? N: We recommend that you take CSC 151; it presents a different view you might enjoy. Y: We recommend that you take CSC 151. You'll learn new things. Y: We recommend that you talk to a faculty member.
We also use forms of binary trees called binary search trees. In binary search trees, everything in the left subtree is smaller than the root (in a tree of strings, everything in the left subtree comes alphabetically before the root) and everything in the right subtree is larger than the root (in a tree of strings, everything in the right subtree comes alphabetically after the root).
As you should recall from our initial explorations of recursion, there is a traditional pattern for recursion over lists:
(define recursive-proc (lambda (lst) (if (null? lst) (base-case) (combine (car lst) (recursive-proc (cdr lst) other)))))
We chose this pattern because of the common definition of a list. Because a list is either null or the cons of a value and a list we have two cases: one for when the list is null and one for the cons case. Since the cdr of a list is itself a list, in makes sense to recurse on the cdr.
A binary tree, in comparison, is either the empty tree or a node. If it’s a node, it contains a value and two successors. Hence, we will need to recurse on both halves, as well as look at the value in the node.
(define tree-proc (lambda (tree) (if (empty? tree) (base-case other) (combine (contents tree) (tree-proc (left tree)) (tree-proc (right tree))))))
We can use this pattern to count the number of values in a tree.
(define tree-size (lambda (tree) (if (empty? tree) 0 (+ 1 (tree-size (left tree)) (tree-size (right tree))))))
We can also use this pattern to find the depth of a tree. In this case, the depth of the empty the tree is 0, and the depth of any other tree is one higher than the depth of its largest subtree.
(define tree-depth (lambda (tree) (if (empty? tree) 0 (+ 1 (max (tree-depth (left tree)) (tree-depth (right tree)))))))
We can use a variant of this pattern to search the tree for a value.
(define tree-contains? (lambda (tree val) (cond [(empty? tree) #f] [(equal? (contents tree) val) #t] [(tree-contains? (left tree) val) #t] [(tree-contains? (right tree) val) #t] [else #f])))
Of course, if we find ourselves writing that many explicit
#f, we should probably rewrite it using the Boolean operations.
(define tree-contains? (lambda (tree val) (and (not (empty? tree)) (or (equal? (contents tree) val) (tree-contains? (left tree) val) (tree-contains? (right tree) val)))))
You will note that we always use direct recursion, rather than helper recursion. That’s because it’s very difficult to express recursion on both subtrees using helper recursion.
tree-contains? procedure that we just saw looks everywhere in a
tree for a value. But we noted in the prior section that a tree can
be arranged so that smaller things are to the left and larger things
are to the right. In that case, we only need to look on one side.
Here’s a simple implementation of such a procedure for trees of
;;; Procedure: ;;; bst-find ;;; Parameters: ;;; bst, a binary search tree of strings ;;; str, a string ;;; Purpose: ;;; Determine if str appears in the tree ;;; Produces: ;;; result, a Scheme value ;;; Preconditions: ;;; bst is a binary search tree. That is, it is a binary tree ;;; with the property that each left subtree contains the smaller ;;; values and each right subtree contains the larger values. ;;; Postconditions: ;;; * If something equal to str (same letters, potentially different ;;; capitalization) appears in the tree, result is that value. ;;; * Otherwise, result is #f. (define bst-find (lambda (bst str) (if (empty? bst) #f (let ([root (contents bst)]) (cond [(string-ci=? str root) root] [(string-ci<? str root) (bst-find (left bst) str)] [else (bst-find (right bst) str)])))))
How does this work in practice? Let’s consider an example. Suppose we
wanted to see if
"Mike" appears in the tree given in the initial figure.
(string-ci=? "Mike" "John")returns
(string-ci<? "Mike" "John")returns
"Sam"). Since we know that
"John", we don’t have to worry about comparing
"Janet", all of whom come before
(string-ci=? "Mike" "Sam")returns
(string-ci<? "Mike" "Sam")returns
(string-ci=? "Mike" "Marge")returns
(string-ci<? "Mike" "Marge")returns
(string-ci=? "Mike" "PM")returns
(string-ci<? "Mike" "PM")returns
Mike” is not in the tree and return
Lists are built in to Scheme. Binary trees are not. How do we implement these binary trees? One mechanism is to use pairs in a clever way. Since we have to store three values, rather than two, we can use two pairs.
(define node (lambda (val l r) (cons val (cons l r)))) (define contents car) (define left cadr) (define right cddr)
However, we may find it easier to implement binary trees using
vectors. Why vectors? One reason is that they are a bit easier to
read. We can also insert a special symbol (e.g.,
'node) to indicate
that a vector is supposed to represent a node. With pairs, we will have
trouble distinguishing lists from trees from other pair structures.
(define node (lambda (val l r) (vector 'node val l r))) (define contents (section vector-ref <> 1)) (define left (section vector-ref <> 2)) (define right (sectionvector-ref <> 3)) (define node? (lambda (val) (and (vector? val) (= (vector-length val) 4) (equal? (vector-ref val 0) 'node))))
Describe the series of steps that
bst-find would use in searching the
tree given above for the value
Recall the pattern for tree recursion.
a. Identify the base case and combination of both
b. Relate the two pieces for each function. How is the base case value related to (or working with) the combination step for each function?
c. Contrast the values of each of these pieces between the two functions. How and why do the base cases differ? How and why do the combinations differ?