CSC 153: Computer Science Fundamentals Grinnell College Spring, 2005 Laboratory Exercise Reading

# More Binary Search Trees

## Summary

This laboratory builds on the previous lab on trees to provide more experience with binary search trees. Work in this lab involves expanding program ~walker/java/examples/trees/BSTree.java that formed the basis for the previous lab.

## Computing the Height of a Tree

Several terms describe characteristics of nodes and paths within a tree:

• Length: The length of a path in a binary search tree is the number of edges it contains (which is one less than the number of nodes on the path).
• Depth or level: The depth or level of a node is the length of the directed path from the root to that node.
• Height: the height of a tree is the length of the path from the root to a node on the lowest level.

For example, consider the following tree: In this tree, the path from the root (containing value 44) to the node containing value 29 has length 4.
Similarly, the length of the path from the nodes between values 54 and 56 is 2.

Also, the node at level 0 contains value 44;
the nodes at level 1 contain values 8 and 56;
the nodes at level 2 contain values 3, 23, 51, and 63;
the nodes at level 3 contain values 16, 33, 46, 54, and 72; and
the nodes at level 4 contain values 29, 49, 53, 55, 68, and 75.

Finally, since the nodes farthest away from the root are at level 4, the tree has height 4.

## Determining if a Tree is Height Balanced

The following discussion is taken, with minor editing, from Computer Science 2: Principles of Software Engineering, Data Types, and Algorithms by Henry M. Walker [Scott, Foresman and Company, 1989]. This material is used with permission of the copyright holder.

While many tree algorithms work for binary search trees of any shape, the efficiency of tree algorithms often requires that the trees are reasonably compact or balanced. The following definition provides one perspective on the notion of a balanced tree.

• A tree of height h is said to be balanced or completely balanced if all leaves occur at level h or h-1, and if all nodes at levels lower than h-1 have two children.

Applying this definition to the following trees, Tree C is not balanced, because the tree has height 3, but 'q' is a leaf at level 1. On the other hand, Tree D is balanced, because all leaves occur at levels 2 and 3 and all nodes at levels 0 and 1 have two children.

While completely balanced trees are particularly compact, it may take considerable processing to maintain such balance in a tree as nodes are inserted and deleted. Thus, completely balanced trees may be of more theoretical than practical interest. However, it turns out that another notion of balance can be maintained with reasonable processing (although the details must be left to a more advanced course).

• A tree is said to be height-balanced if, for each node in the tree, the height of the left subtree differs from the height of the right subtree by no more than 1.

Applying this definition to Trees A and B in the earlier section of this lab on Computing the Height of a Tree, Tree A is not height-balanced, because the node containing the value 8 has a left subtree of height 0, while its right subtree has height 2. In the same tree, the node with value 63 has a similar problem, although all other nodes have the desired property. Since at least one node is unbalanced, this first tree is not height balanced.

In contrast, Tree B (in the following diagram) is height balanced. ## Deletion of a Node

Finally, in our discussion of binary search trees, we consider how to delete a node from a tree. This tasks sometimes is a bit tricky, because the resulting tree must be a binary tree (with no more than two children for any node), and because it must have an appropriate ordering of data to remain a search tree.

The work required for removing a node from a tree depends upon what children that node might have. To determine the alternatives, consider again Tree B from earlier in this lab: The removal of a node from a tree involves four cases:

• The node has no children (e.g., nodes a, c, e, g, j, m, o , and q in the example).
• The node has no left children, but it does have right children (e.g., node l).
• The node has no right children, but it does have left children (e.g., node h).
• The node has both left and right children (e.g., nodes b, d, f, i, k, n, p).

We now examine each of these cases in turn.

#### Removal of a leaf

If the node to be deleted is a leaf, then it can be removed without any further effect on the tree. The resulting tree may or may not be balanced or height balanced, but the new tree will still be a valid binary search tree. For example, the following diagram shows Tree B with both nodes a and q removed. #### Removal of a node with no left child (but with right children)

When a node has a no left child, then the right child may be promoted to the position of the removed node. For example, in the original Tree B, when removing node l, the subtree beginning with node m may take the place of l.

#### Removal of a node with no right child (but with left children)

Similarly, when a node has no right child, the the left subtree may be promoted. For example, in the original Tree B, when removing node h, the subtree beginning with node g may move upwards.

The following diagram shows the original Tree B with both nodes 'h' and 'l' removed. #### Removal of a node with both left and right children

In the original Tree B, suppose we wish to remove the root, node k. We cannot just remove node k, since it connects two subtrees. Also, we cannot just move the left or right subtree upward to the root, because we must include the other subtree in the final structure as well. Instead, we first consider what values might replace 'k' in the root. In order to retain the ordering of nodes within a search tree, there are only two choices:

• we could promote 'j': the largest value in the left subtree, or
• we could promote 'l': the smallest value in the right subtree.

To find either of these candidates for promotion, we follow a similar strategy:

• To find the largest value in the left subtree, move left from the root (to the left subtree), and then move right as far as possible.
• To find the smallest value in the right subtree, move right from the root (to the right subtree), and then move left as far as possible.

While moving either of these candidates to the root maintains the tree structure, Tree B shows that further work might be needed if a candidate still has a non-empty subtree. In particular, if we promote node l to the root, then we must re-attach m (and any subtree of m) in its place. These steps for promoting l in Tree B are shown below: As this illustrates, removal of a node with two children involves three steps:

1. Finding the smallest value in the right subtree (or analogously, the largest value in the left subtree),
2. Promoting that value to replace the node to be removed, and
3. Replace the promoted node further down in the tree by its right subtree (or analogously, the left subtree).

As a technical note, a programmer should observe that changing a tree requires changing a pointer in the parent node (or changing the root pointer). Thus, in working with the removal of nodes, code typically must contain a variable to keep track of a parent node, while other variables keep track of nodes to be deleted and candidates for promotion.

This document is available on the World Wide Web as

```    http://www.cs.grinnell.edu/~walker/courses/153.sp05/readings/reading-more-BST-trees.shtml
```

 created April 21, 2002 last revised March 24, 2005  For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.