Trees
Summary:
In our exploration of pair structures, we noted that you can build
structures using cons that are not always lists.
We typically call these structures trees.
Trees are an important dynamic data structure used in a variety
of problems.
In this reading, we explore more about trees, including some terminology
for talking about trees, some formal for describing trees, and some
techniques for recursing over trees (using what is often called
deep recursion).
Number Trees, Revisited
Recall the following structure from the reading on pairs and pair
structures.
Computer scientists typically call such a structure a
tree (even though it looks more like the roots
of a tree than the tree). In this case, because all of the values in
the tree are numbers, we call it a number tree
or a tree of numbers.
There are a number of important characteristics we associate with trees,
including
the root of a tree, which refers to the
top of the tree;
the size of a tree, the number of values stored
in the tree (sometimes the size also includes the number of pairs);
the depth of a tree, the length of the longest
path from the top of the tree to the furthest value; and
the base type of the tree, such as
number
in number trees, which indicates what kind of
values the tree is built from. We call trees that have no single
base type heterogeneous trees.
At any point in the tree built with a pair, we usually refer to the
two parts below it as the left subtree (the car)
and the right subtree (the cdr). We refer to
each pair in a tree as a node and each non-pair
value as a leaf.
Formalizing Trees
We have an informal definition of trees, given by the figure above.
Can we define them formally? Yes. Just as we can define lists
recursively, so can we define trees recursively. As you may recall,
the definition of a list goes something like the following:
The empty list (null) is a list.
If lst is a list and val
is any Scheme value, then (cons val
lst) is a list.
Nothing else is a list.
We do we call this a recursive definition
? Because
the definition of a list is in terms of lists (at least in the middle
item). We can write a similarly recursive definition for trees.
Any value is a tree.
If t1 and t2 are trees, then
so is (cons t1 t2).
Nothing else is a tree.
We can also make this definition a little more specific, so that we
can speak of things like color trees
.
Any color is a color tree.
If t1 and t2 are
color trees, then so is (cons
t1 t2).
Nothing else is a color tree.
Patterns of Tree Recursion
As you should recall from our initial explorations of recursion,
there is a traditional pattern for recursion over lists:
(define recursive-proc
(lambda (lst other)
(if (null? lst)
(base-case other)
(combine other
(car lst)
(recursive-proc (cdr lst) other)))))
We chose this pattern because of the common definition of a list.
Because a list is either null or the cons of a value and
a list we have two cases: one for when the list is null
and one for the cons case. Since the cdr of a list is itself a list,
in makes sense to recurse on the cdr.
A tree, in comparison, is either a value or a pair of trees. Our
base case is when we do not have a pair. Since both parts of the pair are
trees, we should therefore recurse on both halves, rather than just the
cdr. Hence, we define the common pattern for recursion over trees as
follows.
(define recursive-proc
(lambda (tree)
(if (pair? tree)
(combine (recursive-proc (car tree))
(recursive-proc (cdr tree)))
(base-case tree))))
For example, if we know that the tree contains only numbers,
we can build sum-of-number-tree by using
+ to combine the recursive calls and simply return
tree for the base case. (We'll use
ntree instead of tree to
remind ourselves that we're working with a tree of numbers.)
We can also use this pattern to find the depth of a tree. In this case,
the depth of a tree that contains only one value is 0, and the depth of
any other tree is one higher than the depth of its largest subtree.
For example,
We can also use the pattern to find the number of values in a tree. In this
case, if we have a non-tree, there's only one value. Otherwise, we need
to combine the number of values in the left and right subtrees.
For example,
We can even use this pattern to flatten
the tree into a
list with the same values, in the order they appear from left-to-right.
In this case, we turn non-paired values into lists, and append the
results of flattening subtrees.
For example,
Trees vs. Nested Lists
As you may recall, a simple list (such as a list of numbers) is simply a
tree in which all the left subtrees (the car elements) are values and
the final right subtree is null.
When we nest lists (that is, make lists elements of lists), we build
structures that are very much like trees, except that we maintain the
limitation that the rightmost subtree at every stage must be null.
For example, consider the list (1 (2 3) (4 (5) (6 7)) (8
9)). It has four elements (the 1, the (2
3), the (4 (5) (6 7)) and the (8 9)),
all but the first of which are themselves lists. We might consider
it a number tree, except for the problem that the non-pair values
include not just numbers, but also a variety of nulls: at the end of
the top-level list, at the end of the list (2 3), at the
end of the next list and each of its sublists, and so on and so forth.
Hence, if we try to apply the sum-of-number-tree procedure,
it will try to add these null values and therefore stop
with an error. We get similar problems when we try to use procedures
like tree-flatten and tree-size.
What do we do? We use a similar technique for nested list structures
that we use for trees, except that we incorporate the chance that a
value is null. For the case of sum, when we hit null, we
return 0.
Here are some examples that contrast the behavior
of sum-of-nested-number-lists with
sum-of-number-tree and the old sum procedure
we wrote when first exploring lists.
Using this particular procedure as an example, we can also suggest a
typical form for procedures that recurse over nested lists.
(define recursive-proc
(lambda (val)
(cond
((pair? val)
(combine (recursive-proc (car val))
(recursive-proc (cdr val))))
((null? val) null-case)
(else (base-case val)))))
We might use this form to tally the number of times a colors appears
in a nested list structure (this is much like our count of the number
of values in a tree, except we also check the type of the values we find).
In this case, we return 0 for the null case and 1 or 0 for the base
case (depending on whether val is a color or not).
We can also tally the number of times a particular color appears by
adding another parameter (the color) and changing the extra test.
And, as these examples suggest, we can write a better version of
flatten that works for both trees and nested lists.
As the following examples suggest, flatten works
for trees, nested lists, and even simple values.
An Image Application: Color Trees
Are trees useful for drawings? Certainly. For one thing, it
can be fun to simply draw trees. However, trees are also used to
represent images. For example, we might break a large image up into
smaller rectangular sections, and describe it as a tree of sections.
Why sections? Because we might be able to choose a different palette
for each section, or perhaps even choose a single color for a section.
These trees are typically called quad trees and
are beyond the initial scope of this course.
Don't despair! We can still draw some fairly interesting images
using color trees. Here's one approach: Given a region of an image
in which to draw, and a color tree to draw in that region, either
(a) fill the region with a single color, if the tree represents a
single color or (b) split the region in half, and draw each half of
the subtree in half of the image.
We might encode that strategy as follows.
You will have an opportunity to explore this procedure (and some variations)
in the corresponding
laboratory. You may also have the opportunity for further
explorations in the near future.