Laboratory Exercises For Computer Science 152

The Insertion Sort

Summary: This laboratory exercise introduces and analyzes the insertion sort as a mechanism to order numbers in an array.

Introduction

Many applications require the maintenance of ordered data. In Java (and Scheme), a simple way to structure ordered data is in an array (or vector), such as the following ordered array A of integers:

A:  2 3 5 7 9 10 13 18 24 27 33 35 37

One common sorting approach is based on code that assumes that the first part of an array is ordered and then adds successive items to this array segment until the entire array is sorted. To understand this approach, we first consider how to add one item to an ordered array segment. We then apply this work to each array element in turn to yield an ordered array.

Maintaining An Ordered Array Segment

Suppose items A[0], ..., A[k-1] are ordered in array A:

A:  3 7 9 10 18 27 33 37

The following code inserts an item into the array, so that items A[0], ..., A[k] become ordered:


int i = k-1;
while ((i >= 0) && a[i] > item){
   a[i+1] = a[i];
   i--;
}
a[i+1] = item;
  1. Review the steps in the above code for the array segment above (with k-1 = 7) and with item having the value 17. Explain what happens and why.

  2. Repeat step 1 for the original array segment above, but inserting 40. Again, explain what happens.

  3. Now repeat step 1 for the original array segment, but inserting 2. And again explain what happens when the code is executed.

  4. Why does the above code contain the test (i >= 0)?

Using this basic insertion step, an array A can be sorted iteratively according to the following outline:

This outline gives rise the the following code, called an insertion sort.


public static void insertionSort (int [] a) {
// method to sort using the insertion sort
   for (int k = 1; k < a.length; k++) {
      int item = a[k];
      int i = k-1;
      while ((i >= 0) && a[i] > item){
         a[i+1] = a[i];
         i--;
      }
      a[i+1] = item;
   }
}
  1. Write a few sentences explaining how this code works.

  2. Program ~walker/152/java-examples/SortShell1.java provides a shell for testing the insertion sort. Copy this file to your account.

  3. Review SortShell1.java, and note that it asks you to input several values into an array, so the array can be used to test the insertion sort. Compile and run this program with several sets of test data.

  4. Modify insertionSort, so that the array is sorted in descending order rather than ascending order. (After testing your revised program, change the code back to its original form for use in the rest of this lab.)

Analysis and Timing

  1. Analyze the above code for both the best case and the worst case. Identify circumstances under which the best case and the worst case analyses could occur. What is the order of the insertion sort in the best case and in the worst case?

  2. Suppose the algorithm requires t milliseconds to sort N items. Using the order analysis you just gave, how long would you expect the algorithm would take to support 2N items. Briefly explain your answer.

The Java programming language provides a mechanism to compute run times of various programs. Specifically, System.currentTimeMillis() returns the amount of time (in milliseconds) since January 1, 1970. This method allows us to use the computer to time various algorithms.

  1. Place the following declarations at the start of main in SortShell1.java
    
    long start_time;
    long end_time;
    
    (Here, long allows these variables to store larger integers than the typical int integer data type.) Next, place the following lines just before and after the call to insertionSort in the body of main.
    
    start_time = System.currentTimeMillis();
    // the call insertionSort (c); goes here 
    end_time = System.currentTimeMillis();
    
    Finally, place the following print statements after the printing of the c array.
    
    out.println ("start:   " + start_time);
    out.println ("end:     " + end_time);
    out.println ("elapsed: " + (end_time - start_time));
    
    Now, compile and run the program, and explain what happens.

In your test, the elapsed time likely was shown as 0, since the time to sort a dozen items is smaller than a millisecond -- the accuracy of the clock. To get measurements that better illustrate efficiency issues, we must ask the machine to sort a larger data set. Rather than typing many data elements into the program during execution, we could generate test data as follows:

(In the last case, Math.random() generates a random real number between 0 and 1, so Math.random()*10000 generates a random real number between 0 and 10000. The "cast" or phrase (int) converts this to a random integer.)

  1. Generate 20,000 numbers in ascending order for c, and run the insertion sort several times. Record your times.

  2. Next generate 40,000 numbers in ascending order, and run the insertion sort again several times. How do the times for 20,000 and 40,000 compare? How does this relate to your predictions using order analysis in parts 7 and 8?

  3. Generate 1,000 numbers in descending order for c, and run the insertion sort several times. Again record your times.

  4. Now generate 2,000 numbers in descending order, run the program several times, and compare the results. Again, relate your results to your predictions using order analysis in parts 7 and 8.

  5. Next, predict the time required by the insertion sort to sort 20,000 numbers in descending order. Run your program for this size data set to obtain timings. Briefly explain your results.

  6. Finally, use Math.random(), as described above, to generate random data. Run the program on data sets of size 2,000, 4,000, and 20,000. Does this "average" case seem to have O(n) or O(n2)? Briefly explain your answer.

This document is available on the World Wide Web as

http://www.math.grin.edu/~walker/courses/152.sp01/lab-n-squared-sorts.html

created March 4, 2001
last revised March 5, 2001