Binary search trees

A binary search tree is a recursively defined data structure that simplifies the process of conducting binary searches by marking appropriate bisection points structurally instead of requiring them to be computed. Unlike an array, it also allows for the efficient insertion of new items in the data structure during program execution.

Here's the recursive definition: A binary search tree B is either empty or consists of a datum (typically a record that includes a key field) and two binary search trees L and R, called the left and right subtrees of B. Either or both of these subtrees may of course be empty. An invariant condition of every non-empty binary search tree B is that if its left subtree L is not empty, the key of the record stored at L is less than the key of the record stored at B (or precedes it in some conventional ordering of the key data type), and similarly that if B's right subtree R is not empty, the key of the datum stored at R is greater than the key of the record stored at B. This invariant must be preserved whenever the contents of a binary search tree are changed.

In Pascal, one can use pointers to implement this data type, as follows:

type
  element = record
              key: key_type;
              { and presumably other fields }
            end;
  bst = ^component;
  component = record
                datum: element;
                left, right: bst
              end;
An empty binary search tree will be represented by a NIL pointer, a non-empty one by a pointer to a component containing the datum and two binary search trees.

Creating an empty binary search tree is trivial, or one that contains just one datum, is trivial, as is testing whether a given binary search tree is empty:

function empty_bst: bst;
begin
  empty_bst := NIL
end;

function singleton_bst (elm: element): bst;
var
  result: bst;
begin
  new (result);
  result^.datum := elm;
  result^.left := empty_bst;
  result^.right := empty_bst;
  singleton_bst := result
end;

function is_empty_bst (b: bst): Boolean;
begin
  is_empty_bst := (b = NIL)
end;
Suppose, however, that one wants to add a datum to an existing binary search tree B. This is again trivial if B is empty; one simply replaces B with the singleton binary search tree containing just the new datum. But if B is non-empty one must be careful to preserve the invariant stated above. Recursion provides a simple way to do this: Compare the datum to be inserted with the one stored at B. If the datum to be inserted is less than B, insert it into B's left subtree; otherwise, insert it into B's right subtree. In either case, perform the replacement by making a recursive call to the insertion procedure. The recursion will bottom out when one reaches a subtree of a subtree of a subtree, etc., that is empty. Replace this one with the singleton binary search tree containing just the new datum!

Or in Pascal:

procedure insert_into_bst (elm: element; var b: bst);
begin
  if is_empty_bst (b) then
    b := singleton_bst (elm)
  else if elm.key < b^.datum.key then
    insert_into_bst (elm, b^.left)
  else
    insert_into_bst (elm, b^.right)
end;
Recursion is also the appropriate tool when one wants to search a binary search tree, perhaps to recover a record containing a specified key:

function search_bst (sought: key_type; b: bst;
   var found: element): Boolean;
begin
  if is_empty_bst (b) then
    search_bst := FALSE
  else if sought < b^.datum.key then
    search_bst := search_bst (sought, b^.left, found)
  else if b^.datum.key < sought then
    search_bst := search_bst (sought, b^.right, found)
  else begin
    search_bst := TRUE; { because sought = b^.datum.key }
    found := b^.datum
  end
end;
In procedure insert_into_bst and function search_bst, only one branch of the tree is explored; subtrees that contain elements that can't possibly be relevant to the insertion or to the search are bypassed. Occasionally, however, one wants to perform some operation on each of the data stored in a tree, specifically in ascending order. For instance, one might want to write out an index of the collection of records stored in the tree -- a list of the keys, in ascending order, one to a line. Recursion can be used for this purpose, too, but there will be two recursive calls rather than one, corresponding to the fact that each non-empty binary search tree contains two subtrees:

procedure print_bst_data (b: bst);
begin
  if not is_empty_bst (b) then begin
    print_bst_data (b^.left);
    writeln (b^.datum.key);
    print_bst_data (b^.right)
  end
end;
Notice that the keys for all of the data stored in B's left subtree will be printed out before the key for the datum actually stored at B (since, under the invariant, all those keys will precede the one stored at B in the desired ordering), while all those stored in the right subtree will be printed only after the key for the datum stored at B. This technique for traversing a binary search tree is called an inorder traversal. One can generalize it by abstracting the procedure to be performed on each datum:

procedure apply_throughout_bst (var b: bst;
    procedure p (var elm: element));
begin
  if not is_empty_bst (b) then begin
    apply_throughout_bst (b^.left, p);
    p (b^.datum);
    apply_throughout_bst (b^.right, p)
  end
end;
It would alternatively be possible to perform some operation on the datum at each non-empty binary search tree before proceeding to its left and right subtrees; a traversal that is arranged in this way is called a preorder traversal. Still another arrangement is to defer the processing of the datum at each non-empty bst until after the data in both of its subtrees have been dealt with; this is a postorder traversal. A postorder traversal is used, for instance, when deallocating a binary search tree:

procedure deallocate_bst (var b: bst);
begin
  if not is_empty_bst (b) then begin
    deallocate_bst (b^.left);
    deallocate_bst (b^.right);
    dispose (b)
  end
end;
A postorder traversal is necessary in this case because after the storage at the other end of the pointer b has been recycled, there would be no way to reach b^.left or b^.right.

Deleting a single datum from a binary search tree is in some cases a rather delicate operation. If the datum is stored at a position where both the left and right subtrees are empty, the binary search tree can be treated as a singleton and simply replaced with an empty bst. If one of the subtrees is empty and the other is not, the non-empty subtree can simply be promoted into the position occupied by its parent; this will not affect the truth of the invariant, since all of the elements in that subtree will still be in the same order relative to everything outside that subtree.

The most difficult case arises, then, when the datum to be deleted is at a position where both of the subtrees are non-empty. The solution is to find one of the two data that is adjacent, in preorder, to the one to be deleted -- either the element in the left subtree that has the largest key, or the element in the right subtree that has the smallest key. (The choice between these alternatives is arbitrary, so in this implementation I've arranged for the selection of the former.) It is impossible for this adjacent element to be at a position where both of its subtrees are non-empty (otherwise one of its subtrees would contain an element still closer to the datum to be deleted). So this adjacent element can be copied up into the record containing the datum to be deleted, overwriting it, and then the adjacent element can itself be deleted by one of the relatively easy methods described in the previous paragraph.

Here's how it looks in Pascal. The call to assert traps the error of trying to delete an element from a binary search tree that doesn't contain it.

procedure delete_from_bst (sought: key_type; var b: bst);
var
  delendum: bst;
    { a spare pointer to the component to be recycled }

  { The delete_largest function finds and returns the largest element in
    a given binary search tree and simultaneously deletes it from that
    binary search tree.  It presupposes that the binary search tree is not
    empty. }

  function delete_largest (var site: bst): element;
  var
    delendum: bst;
      { a spare pointer to the component to be recycled }
  begin
    if is_empty_bst (site^.right) then begin
      delete_largest := site^.datum;
      delendum := site;
      site := site^.left;
      dispose (delendum)
    end
    else
      delete_largest := delete_largest (site^.right)
  end;

begin { procedure delete_from_bst }
  assert (not is_empty_bst (b), 1);
  if sought < b^.datum.key then
    delete_from_bst (sought, b^.left)
  else if b^.datum.key < sought then
    delete_from_bst (sought, b^.right)
  else begin { we've found the datum to be deleted }
    if is_empty_bst (b^.left) then begin
      delendum := b;
      b := b^.right;
      dispose (delendum)
    end
    else if is_empty_bst (b^.right) then begin
      delendum := b;
      b := b^.left;
      dispose (delendum)
    end
    else
      b^.datum := delete_largest (b^.left)
  end
end;
Given an array of records of type element, here's a procedure that will sort them by inserting them into a binary search tree and then copying them back out again:

type
  element_array = array [1 .. array_size] of element;

procedure bst_sort (var arr: element_array);
var
  container: bst;
  index: integer;

  procedure restore (var elm: element);
  begin
    index := index + 1;
    arr[index] := elm
  end;

begin
  container := empty_bst;
  for index := 1 to array_size do
    insert_into_bst (arr[index], container);
  index := 0;
  apply_throughout_bst (container, restore);
  deallocate_bst (container)
end;
Alternatively, one might read the elements in from a file of element records and write them back out to the same file in sorted order:

type
  element_file = file of element;

procedure bst_sort_file (var f: element_file);
var
  container: bst;

  procedure write_to_file (var elm: element);
  begin
    write (f, elm)
  end;

begin
  reset (f);
  container := empty_bst;
  while not eof (f) do begin
    insert_into_bst (f^, container);
    get (f)
  end;
  rewrite (f);
  apply_throughout_bst (container, write_to_file);
  deallocate_bst (container)
end;

This document is available on the World Wide Web as

http://www.math.grin.edu/~stone/courses/fundamentals/binary-search-trees.html

created April 21, 1996
last revised April 21, 1996