The merge sort

The insertion sort algorithm is not always the best one to use. When sorting n values, starting from a random initial arrangement, the insertion sort has to look through half of the elements in the sorted part of the data structure to find the correct insertion point for each new value it places. The size of that sorted part increases linearly from 0 to n, so its average size is n/2 and the average number of comparisons needed to insert one element is n/4. Taking all the insertions together, then, the insertion sort performs about n2/4 comparisons to sort the entire set.

Accordingly, when the number of values to be sorted is large (greater than one hundred, say), it is preferable to use a sorting method that is more complicated to set up initially but performs fewer operations on each element in the process of positioning it correctly. The merge sort algorithm is often an appropriate choice. We shall examine two variants of this algorithm -- one taking a list of values and returning a newly allocated list containing the same values, but in sorted order, and the other one taking a vector of values and arranging its elements as a ``destructive'' side effect.

Merge sort: constructing a sorted list

The fundamental building block for the first version of the merge sort is the merge procedure, which takes two sorted lists as arguments and returns a sorted list containing all of the values from both argument lists. Program 4.3 on page 98 of our textbook is an implementation of this procedure. Since the lists to be merged will sometimes be quite large, I prefer the following tail-recursive implementation:

(define merge
  (lambda (ls-1 ls-2)
    (let loop ((rest-1 ls-1)
               (rest-2 ls-2)
               (merged '()))
      (cond ((null? rest-1) (revappend merged rest-2))
            ((null? rest-2) (revappend merged rest-1))
            (else (let ((first-1 (car rest-1))
                        (first-2 (car rest-2)))
                    (if (< first-2 first-1)
                        (loop rest-1 (cdr rest-2) (cons first-2 merged))
                        (loop (cdr rest-1) rest-2 (cons first-1 merged)))))))))

(define revappend
  (lambda (ls-1 ls-2)
    (if (null? ls-1)
        ls-2
        (revappend (cdr ls-1) (cons (car ls-1) ls-2)))))

The idea is to build up the result list merged by repeatedly taking either the next available value from ls-1 or the next available value from ls-2, whichever comes first, and attaching it to merged. In the recursive call, whichever value was thus selected and transferred is eliminated from further consideration. The process stops as soon as one of the original lists is exhausted; at that point, the remaining values in the other list (if any) are appended to the merged list. (This is done with revappend rather than append, because merged is built up in the wrong order during the tail recursion -- the first value recovered from ls-1 or ls-2 becomes the last element of merged, and so on.)


Exercise 1

Use merge to merge the lists (2 3 4 7 8 10 12) and (1 6 11 13 14).


Exercise 2

What happens if merge is applied to lists that are not already in ascending order?


Exercise 3

What happens if merge is given two empty lists as arguments?


The merge sort works by starting with short lists, merging them to form somewhat longer lists, merging the somewhat longer lists to form still longer ones, and so on until only one list remains -- the result of the final merge operation. The short lists that we begin with must satisfy the precondition for the merge procedure: they must already be sorted. The easiest way to ensure this without having to perform any comparisons is to put just one value into each short list; a single value is already ``sorted,'' in a trivial or vacuous sense (and the merge procedure will work correctly on such single-element lists).

We'll keep all of the lists that will eventually be merged together in one master list of lists. If ls is the list to be sorted, then, the master list will initially be (map list ls) -- a list of one-element lists, each containing one element of ls.

The following procedure, pair-merge, then traverses this master list, taking its lists two at a time, merging each pair and adding the merged pair to a new master list, which it returns. (If the master list contains an odd number of lists, pair-merge leaves the last one alone.)

(define pair-merge
  (lambda (master-list)
    (let loop ((sublists master-list)
               (new-master-list '()))
      (cond ((null? sublists) new-master-list)
            ((null? (cdr sublists)) (cons (car sublists) new-master-list))
            (else (loop (cddr sublists)
                        (cons (merge (car sublists) (cadr sublists))
                              new-master-list)))))))

The full merge-sort procedure simply calls pair-merge repeatedly until all of the lists have been combined into one. (The empty list must be handled as a special case, because it is impossible to ``combine'' zero lists into one.)

(define merge-sort
  (lambda (ls)
    (if (null? ls)
        '()
        (do ((master-list (map list ls) (pair-merge master-list)))
            ((null? (cdr master-list)) (car master-list))))))

Like the insertion sort, the merge sort algorithm can be made more generally applicable by making the ordering rule a parameter of the (curried) procedure, so the final implementation of this first version of merge sort, with the merge and pair-merge procedures embedded, looks like this:

(define merge-sort
  (lambda (precedes?)

    (define merge
      (lambda (ls-1 ls-2)
        (let loop ((rest-1 ls-1)
                   (rest-2 ls-2)
                   (merged '()))
          (cond ((null? rest-1) (revappend merged rest-2))
                ((null? rest-2) (revappend merged rest-1))
                (else (let ((first-1 (car rest-1))
                            (first-2 (car rest-2)))
                        (if (precedes? first-2 first-1)
                            (loop rest-1 (cdr rest-2)
                                  (cons first-2 merged))
                            (loop (cdr rest-1) rest-2
                                  (cons first-1 merged)))))))))

    (define pair-merge
      (lambda (master-list)
        (let loop ((sublists master-list)
                   (new-master-list '()))
          (cond ((null? sublists) new-master-list)
                ((null? (cdr sublists))
                 (cons (car sublists) new-master-list))
                (else (loop (cddr sublists)
                            (cons (merge (car sublists) (cadr sublists))
                                  new-master-list)))))))

    (lambda (ls)
      (if (null? ls)
          '()
          (do ((master-list (map list ls) (pair-merge master-list)))
              ((null? (cdr master-list)) (car master-list)))))))

Again, I have left revappend as a separate, non-embedded procedure because it is often useful even outside the merge-sort procedure. The one change in the code for the embedded procedures is that the call to the numerical comparison procedure < has been replaced with a call to precedes?.

Merge sort: overwriting the contents of a vector

Suppose, now, that the values to be sorted arrive in the form of a vector, and that the objective is to revise the contents of the vector so that at the end of the sorting procedure the same elements are present but arranged in the order specified by the comparison rule. As in the constructive version of the algorithm, we want to work our way up from single-element subvectors. Instead of allocating a separate vector for each single element, however, we can take advantage of constant-time access to the elements of a vector by making the separation purely notional: We can identify a sub-vector of the original vector by keeping track of the positions at which it begins and ends.

I'll adopt the convention that the starting position of a subvector is the position of the first element that is inside the subvector, and the ending position is the position of the first element after and outside of the subvector (or, if there is no such element, the length of the entire vector). So, within the vector #(a b c d e), the subvector with elements b and c has starting position 1 and ending position 3. (This convention has the advantage that the number of elements in the subvector is the difference between the ending position and the starting position. The arguments to Scheme's substring procedure are required to follow the same convention, for the same reason.)

The merge! procedure takes two adjacent subvectors of the same vector, both of which must already be sorted, and overwrites them with the merged (and therefore sorted) version of their elements. Unfortunately, this cannot be done completely ``in place''; there must be a ``holding area'' that provides separate storage for the elements as they are merged, and at the end of the merging process the elements have to be copied back from the holding area into the original vector. In this implementation, the holding area takes the form of a second vector, of the same size as the original. As the two adjacent subvectors of the original vector are merged, they are placed into the positions of the holding vector that they will eventually occupy in the original vector; at the end of the merge, they are copied.


Exercise 4

Define a Scheme procedure copy-subvector! that takes four arguments -- a source vector source, the starting position start and ending position end of a subvector of source, and a target vector target of the same size as source -- and copies the specified subvector of source into the corresponding positions in target, using vector-set!.

> (define vector-1 (vector 'alpha 'beta 'gamma 'delta 'epsilon))
> (define vector-2 (vector 'first 'second 'third 'fourth 'fifth))
> (copy-subvector! vector-1 1 3 vector-2)
> vector-2
#(first beta gamma fourth fifth)
> vector-1
#(alpha beta gamma delta epsilon)

The arguments to the merge! procedure are the starting positions of the two subvectors and the ending position of the second one. (Since they are supposed to be adjacent, the ending position of the first is assumed to be identical with the starting position of the second.)

(define merge!
  (lambda (start-1 start-2 end-2)
    (let loop ((target start-1)
               (current-1 start-1)
               (current-2 start-2))
      (cond ((= current-1 start-2)
             (copy-subvector! holding start-1 current-2 vec))
            ((or (= current-2 end-2)
                 (precedes? (vector-ref vec current-1)
                            (vector-ref vec current-2)))
             (vector-set! holding target (vector-ref vec current-1))
             (loop (+ target 1) (+ current-1 1) current-2))
            (else
             (vector-set! holding target (vector-ref vec current-2))
             (loop (+ target 1) current-1 (+ current-2 1)))))))

The variable target counts off the positions in the holding vector as they are filled up, from left to right. Current-1 keeps track of the position of the leftmost element in the first subvector that has not yet been copied into the holding vector; current-2 does the same for the second subvector. There are three cases, corresponding to the three cond-clauses:

In this definition of merge!, the identifiers vec, holding, and precedes? are not bound. To make it work, one must embed it in a context in which all of these identifiers have been bound. We'll do this below.

The pair-merge! procedure walks through a vector, applying the merge! procedure to pairs of adjacent subvectors of a specified size. (Initially, the size is 1, just as the first call to pair-merge in the constructive version of the merge sort applied to single-element lists. The size will double on each successive call.)

Unless the length of the vector is an exact power of 2, the pair-merge! procedure will occasionally wind up with an unpaired subvector at the end (which is simply left to be merged in on a subsequent pass) or with two subvectors of unequal length. In the latter case, the third argument to merge! is fixed at the length of the entire vector vec to make sure that the merge terminates correctly. Once again, there is an unbound identifier (len) in the body of this definition, so that it should be embedded in a context in which that identifier has been bound.

(define pair-merge!
  (lambda (subvector-size)
    (let loop ((start-1 0))
      (if (< (+ start-1 subvector-size) len)
          (let* ((start-2 (+ start-1 subvector-size))
                 (end-2 (min len (+ start-2 subvector-size))))
            (merge! start-1 start-2 end-2)
            (loop end-2))))))

Finally, the merge-sort! procedure invokes pair-merge! repeatedly until it has merged all the subvectors together:

(define merge-sort!
  (lambda (precedes?)
    (lambda (vec)
      (let* ((len (vector-length vec))
             (holding (make-vector len)))

        ;; Embedded definitions go here.

        (do ((subvector-size 1 (* 2 subvector-size)))
            ((<= len subvector-size))
          (pair-merge! subvector-size))))))

Assembling all the pieces yields the full algorithm:

(define merge-sort!
  (lambda (precedes?)
    (lambda (vec)
      (let* ((len (vector-length vec))
             (holding (make-vector len)))

        (define merge!
          (lambda (start-1 start-2 end-2)
            (let loop ((target start-1)
                       (current-1 start-1)
                       (current-2 start-2))
              (cond ((= current-1 start-2)
                     (copy-subvector! holding start-1 current-2 vec))
                    ((or (= current-2 end-2)
                         (precedes? (vector-ref vec current-1)
                                    (vector-ref vec current-2)))
                     (vector-set! holding target (vector-ref vec current-1))
                     (loop (+ target 1) (+ current-1 1) current-2))
                    (else
                     (vector-set! holding target (vector-ref vec current-2))
                     (loop (+ target 1) current-1 (+ current-2 1)))))))

        (define pair-merge!
          (lambda (subvector-size)
            (let loop ((start-1 0))
              (if (< (+ start-1 subvector-size) len)
                  (let* ((start-2 (+ start-1 subvector-size))
                         (end-2 (min len (+ start-2 subvector-size))))
                    (merge! start-1 start-2 end-2)
                    (loop end-2))))))

        (do ((subvector-size 1 (* 2 subvector-size)))
            ((<= len subvector-size))
          (pair-merge! subvector-size))))))

Exercise 5

Define a Scheme procedure that takes a vector of real numbers and determines whether its elements are arranged in ascending numerical order, returning #t if it is and #f if it is not.


Exercise 6

Design a test procedure that generates a vector of ten thousand integers randomly selected from the range from 0 to 999999, calls merge-sort! to sort the vector into ascending numerical order, and checks whether the sort succeeded, returning #t if the elements wind up in ascending numerical order and #f if they do not.


This document is available on the World Wide Web as

http://www.math.grin.edu/courses/Scheme/spring-1998/merge-sort.html

created November 25, 1997
last revised June 21, 1998

John David Stone (stone@math.grin.edu)