Trees Summary: In our exploration of pair structures, we noted that you can build structures using cons that are not always lists. We typically call these structures trees. Trees are an important dynamic data structure used in a variety of problems. In this reading, we explore more about trees, including some terminology for talking about trees, some formal for describing trees, and some techniques for recursing over trees (using what is often called deep recursion).
Number Trees, Revisited Recall the following structure from the reading on pairs and pair structures. Computer scientists typically call such a structure a tree (even though it looks more like the roots of a tree than the tree). In this case, because all of the values in the tree are numbers, we call it a number tree or a tree of numbers. There are a number of important characteristics we associate with trees, including the root of a tree, which refers to the top of the tree; the size of a tree, the number of values stored in the tree (sometimes the size also includes the number of pairs); the depth of a tree, the length of the longest path from the top of the tree to the furthest value; and the base type of the tree, such as number in number trees, which indicates what kind of values the tree is built from. We call trees that have no single base type heterogeneous trees. At any point in the tree built with a pair, we usually refer to the two parts below it as the left subtree (the car) and the right subtree (the cdr). We refer to each pair in a tree as a node and each non-pair value as a leaf.
Formalizing Trees We have an informal definition of trees, given by the figure above. Can we define them formally? Yes. Just as we can define lists recursively, so can we define trees recursively. As you may recall, the definition of a list goes something like the following: The empty list (null) is a list. If lst is a list and val is any Scheme value, then (cons val lst) is a list. Nothing else is a list. We do we call this a recursive definition? Because the definition of a list is in terms of lists (at least in the middle item). We can write a similarly recursive definition for trees. Any value is a tree. If t1 and t2 are trees, then so is (cons t1 t2). Nothing else is a tree. We can also make this definition a little more specific, so that we can speak of things like color trees. Any color is a color tree. If t1 and t2 are color trees, then so is (cons t1 t2). Nothing else is a color tree.
Patterns of Tree Recursion As you should recall from our initial explorations of recursion, there is a traditional pattern for recursion over lists: (define recursive-proc (lambda (lst other) (if (null? lst) (base-case other) (combine other (car lst) (recursive-proc (cdr lst) other))))) We chose this pattern because of the common definition of a list. Because a list is either null or the cons of a value and a list we have two cases: one for when the list is null and one for the cons case. Since the cdr of a list is itself a list, in makes sense to recurse on the cdr. A tree, in comparison, is either a value or a pair of trees. Our base case is when we do not have a pair. Since both parts of the pair are trees, we should therefore recurse on both halves, rather than just the cdr. Hence, we define the common pattern for recursion over trees as follows. (define recursive-proc (lambda (tree) (if (pair? tree) (combine (recursive-proc (car tree)) (recursive-proc (cdr tree))) (base-case tree)))) For example, if we know that the tree contains only numbers, we can build sum-of-number-tree by using + to combine the recursive calls and simply return tree for the base case. (We'll use ntree instead of tree to remind ourselves that we're working with a tree of numbers.) We can also use this pattern to find the depth of a tree. In this case, the depth of a tree that contains only one value is 0, and the depth of any other tree is one higher than the depth of its largest subtree. For example, We can also use the pattern to find the number of values in a tree. In this case, if we have a non-tree, there's only one value. Otherwise, we need to combine the number of values in the left and right subtrees. For example, We can even use this pattern to flatten the tree into a list with the same values, in the order they appear from left-to-right. In this case, we turn non-paired values into lists, and append the results of flattening subtrees. For example,
Trees vs. Nested Lists As you may recall, a simple list (such as a list of numbers) is simply a tree in which all the left subtrees (the car elements) are values and the final right subtree is null. When we nest lists (that is, make lists elements of lists), we build structures that are very much like trees, except that we maintain the limitation that the rightmost subtree at every stage must be null. For example, consider the list (1 (2 3) (4 (5) (6 7)) (8 9)). It has four elements (the 1, the (2 3), the (4 (5) (6 7)) and the (8 9)), all but the first of which are themselves lists. We might consider it a number tree, except for the problem that the non-pair values include not just numbers, but also a variety of nulls: at the end of the top-level list, at the end of the list (2 3), at the end of the next list and each of its sublists, and so on and so forth. Hence, if we try to apply the sum-of-number-tree procedure, it will try to add these null values and therefore stop with an error. We get similar problems when we try to use procedures like tree-flatten and tree-size. What do we do? We use a similar technique for nested list structures that we use for trees, except that we incorporate the chance that a value is null. For the case of sum, when we hit null, we return 0. Here are some examples that contrast the behavior of sum-of-nested-number-lists with sum-of-number-tree and the old sum procedure we wrote when first exploring lists. Using this particular procedure as an example, we can also suggest a typical form for procedures that recurse over nested lists. (define recursive-proc (lambda (val) (cond ((pair? val) (combine (recursive-proc (car val)) (recursive-proc (cdr val)))) ((null? val) null-case) (else (base-case val))))) We might use this form to tally the number of times a colors appears in a nested list structure (this is much like our count of the number of values in a tree, except we also check the type of the values we find). In this case, we return 0 for the null case and 1 or 0 for the base case (depending on whether val is a color or not). We can also tally the number of times a particular color appears by adding another parameter (the color) and changing the extra test. And, as these examples suggest, we can write a better version of flatten that works for both trees and nested lists. As the following examples suggest, flatten works for trees, nested lists, and even simple values.
An Image Application: Color Trees Are trees useful for drawings? Certainly. For one thing, it can be fun to simply draw trees. However, trees are also used to represent images. For example, we might break a large image up into smaller rectangular sections, and describe it as a tree of sections. Why sections? Because we might be able to choose a different palette for each section, or perhaps even choose a single color for a section. These trees are typically called quad trees and are beyond the initial scope of this course. Don't despair! We can still draw some fairly interesting images using color trees. Here's one approach: Given a region of an image in which to draw, and a color tree to draw in that region, either (a) fill the region with a single color, if the tree represents a single color or (b) split the region in half, and draw each half of the subtree in half of the image. We might encode that strategy as follows. You will have an opportunity to explore this procedure (and some variations) in the corresponding laboratory. You may also have the opportunity for further explorations in the near future.