A binary search tree is a recursively defined data structure that simplifies the process of conducting binary searches by marking appropriate bisection points structurally instead of requiring them to be computed. Unlike an array, it also allows for the efficient insertion of new items in the data structure during program execution.
Here's the recursive definition: A binary search tree B is either empty or consists of a datum (typically a record that includes a key field) and two binary search trees L and R, called the left and right subtrees of B. Either or both of these subtrees may of course be empty. An invariant condition of every non-empty binary search tree B is that if its left subtree L is not empty, the key of the record stored at L is less than the key of the record stored at B (or precedes it in some conventional ordering of the key data type), and similarly that if B's right subtree R is not empty, the key of the datum stored at R is greater than the key of the record stored at B. This invariant must be preserved whenever the contents of a binary search tree are changed.
In Pascal, one can use pointers to implement this data type, as follows:
type
element = record
key: key_type;
{ and presumably other fields }
end;
bst = ^component;
component = record
datum: element;
left, right: bst
end;
An empty binary search tree will be represented by a NIL
pointer, a non-empty one by a pointer to a component containing the datum
and two binary search trees. Creating an empty binary search tree is trivial, or one that contains just one datum, is trivial, as is testing whether a given binary search tree is empty:
function empty_bst: bst; begin empty_bst := NIL end; function singleton_bst (elm: element): bst; var result: bst; begin new (result); result^.datum := elm; result^.left := empty_bst; result^.right := empty_bst; singleton_bst := result end; function is_empty_bst (b: bst): Boolean; begin is_empty_bst := (b = NIL) end;Suppose, however, that one wants to add a datum to an existing binary search tree B. This is again trivial if B is empty; one simply replaces B with the singleton binary search tree containing just the new datum. But if B is non-empty one must be careful to preserve the invariant stated above. Recursion provides a simple way to do this: Compare the datum to be inserted with the one stored at B. If the datum to be inserted is less than B, insert it into B's left subtree; otherwise, insert it into B's right subtree. In either case, perform the replacement by making a recursive call to the insertion procedure. The recursion will bottom out when one reaches a subtree of a subtree of a subtree, etc., that is empty. Replace this one with the singleton binary search tree containing just the new datum!
Or in Pascal:
procedure insert_into_bst (elm: element; var b: bst);
begin
if is_empty_bst (b) then
b := singleton_bst (elm)
else if elm.key < b^.datum.key then
insert_into_bst (elm, b^.left)
else
insert_into_bst (elm, b^.right)
end;
Recursion is also the appropriate tool when one wants to search a binary
search tree, perhaps to recover a record containing a specified key:
function search_bst (sought: key_type; b: bst;
var found: element): Boolean;
begin
if is_empty_bst (b) then
search_bst := FALSE
else if sought < b^.datum.key then
search_bst := search_bst (sought, b^.left, found)
else if b^.datum.key < sought then
search_bst := search_bst (sought, b^.right, found)
else begin
search_bst := TRUE; { because sought = b^.datum.key }
found := b^.datum
end
end;
In procedure insert_into_bst and function
search_bst, only one branch of the tree is explored; subtrees
that contain elements that can't possibly be relevant to the insertion or
to the search are bypassed. Occasionally, however, one wants to perform
some operation on each of the data stored in a tree, specifically in
ascending order. For instance, one might want to write out an index of the
collection of records stored in the tree -- a list of the keys, in
ascending order, one to a line. Recursion can be used for this purpose,
too, but there will be two recursive calls rather than one, corresponding
to the fact that each non-empty binary search tree contains two subtrees:
procedure print_bst_data (b: bst);
begin
if not is_empty_bst (b) then begin
print_bst_data (b^.left);
writeln (b^.datum.key);
print_bst_data (b^.right)
end
end;
Notice that the keys for all of the data stored in B's left subtree
will be printed out before the key for the datum actually stored at
B (since, under the invariant, all those keys will precede the one
stored at B in the desired ordering), while all those stored in the
right subtree will be printed only after the key for the datum stored at
B. This technique for traversing a binary search tree is called an
inorder traversal. One can generalize it by abstracting the
procedure to be performed on each datum:
procedure apply_throughout_bst (var b: bst;
procedure p (var elm: element));
begin
if not is_empty_bst (b) then begin
apply_throughout_bst (b^.left, p);
p (b^.datum);
apply_throughout_bst (b^.right, p)
end
end;
It would alternatively be possible to perform some operation on the datum
at each non-empty binary search tree before proceeding to its left and
right subtrees; a traversal that is arranged in this way is called a
preorder traversal. Still another arrangement is to defer the
processing of the datum at each non-empty bst until after the data in both
of its subtrees have been dealt with; this is a postorder
traversal. A postorder traversal is used, for instance, when deallocating
a binary search tree:
procedure deallocate_bst (var b: bst);
begin
if not is_empty_bst (b) then begin
deallocate_bst (b^.left);
deallocate_bst (b^.right);
dispose (b)
end
end;
A postorder traversal is necessary in this case because after the storage
at the other end of the pointer b has been recycled, there
would be no way to reach b^.left or b^.right. Deleting a single datum from a binary search tree is in some cases a rather delicate operation. If the datum is stored at a position where both the left and right subtrees are empty, the binary search tree can be treated as a singleton and simply replaced with an empty bst. If one of the subtrees is empty and the other is not, the non-empty subtree can simply be promoted into the position occupied by its parent; this will not affect the truth of the invariant, since all of the elements in that subtree will still be in the same order relative to everything outside that subtree.
The most difficult case arises, then, when the datum to be deleted is at a position where both of the subtrees are non-empty. The solution is to find one of the two data that is adjacent, in preorder, to the one to be deleted -- either the element in the left subtree that has the largest key, or the element in the right subtree that has the smallest key. (The choice between these alternatives is arbitrary, so in this implementation I've arranged for the selection of the former.) It is impossible for this adjacent element to be at a position where both of its subtrees are non-empty (otherwise one of its subtrees would contain an element still closer to the datum to be deleted). So this adjacent element can be copied up into the record containing the datum to be deleted, overwriting it, and then the adjacent element can itself be deleted by one of the relatively easy methods described in the previous paragraph.
Here's how it looks in Pascal. The call to assert traps the
error of trying to delete an element from a binary search tree that doesn't
contain it.
procedure delete_from_bst (sought: key_type; var b: bst);
var
delendum: bst;
{ a spare pointer to the component to be recycled }
{ The delete_largest function finds and returns the largest element in
a given binary search tree and simultaneously deletes it from that
binary search tree. It presupposes that the binary search tree is not
empty. }
function delete_largest (var site: bst): element;
var
delendum: bst;
{ a spare pointer to the component to be recycled }
begin
if is_empty_bst (site^.right) then begin
delete_largest := site^.datum;
delendum := site;
site := site^.left;
dispose (delendum)
end
else
delete_largest := delete_largest (site^.right)
end;
begin { procedure delete_from_bst }
assert (not is_empty_bst (b), 1);
if sought < b^.datum.key then
delete_from_bst (sought, b^.left)
else if b^.datum.key < sought then
delete_from_bst (sought, b^.right)
else begin { we've found the datum to be deleted }
if is_empty_bst (b^.left) then begin
delendum := b;
b := b^.right;
dispose (delendum)
end
else if is_empty_bst (b^.right) then begin
delendum := b;
b := b^.left;
dispose (delendum)
end
else
b^.datum := delete_largest (b^.left)
end
end;
Given an array of records of type element, here's a procedure
that will sort them by inserting them into a binary search tree and then
copying them back out again:
type
element_array = array [1 .. array_size] of element;
procedure bst_sort (var arr: element_array);
var
container: bst;
index: integer;
procedure restore (var elm: element);
begin
index := index + 1;
arr[index] := elm
end;
begin
container := empty_bst;
for index := 1 to array_size do
insert_into_bst (arr[index], container);
index := 0;
apply_throughout_bst (container, restore);
deallocate_bst (container)
end;
Alternatively, one might read the elements in from a file of
element records and write them back out to the same file in
sorted order:
type
element_file = file of element;
procedure bst_sort_file (var f: element_file);
var
container: bst;
procedure write_to_file (var elm: element);
begin
write (f, elm)
end;
begin
reset (f);
container := empty_bst;
while not eof (f) do begin
insert_into_bst (f^, container);
get (f)
end;
rewrite (f);
apply_throughout_bst (container, write_to_file);
deallocate_bst (container)
end;
This document is available on the World Wide Web as
http://www.math.grin.edu/~stone/courses/fundamentals/binary-search-trees.html