CSC 153 Grinnell College Spring, 2005
 
Computer Science Fundamentals
Laboratory Exercise
 

Sorting

Goals

This laboratory exercises provides some practice with the insertion sort and Quicksort and outlines a mechanism for determining experimentally the efficiency of these sorting algorithms.

Introduction

Many applications require the maintenance of ordered data. In Java (and Scheme), such data often are structured in an array (or vector). The reading for this lab provides considerable detail on both the insertion sort and quicksort. You should read that material before proceeding further in this lab.

The Insertion Sort

The insertion sort assumes that the first part of an array is ordered and then adds successive items to this array segment until the entire array is sorted.

A main helper method adds add one item to an ordered array segment. More specifically, suppose items A[0], ..., A[k-1] are ordered in array A. The following code inserts an item into the array, so that items A[0], ..., A[k] become ordered:


int i = k-1;
while ((i >= 0) && a[i] > item){
   a[i+1] = a[i];
   i--;
}
a[i+1] = item;
  1. Consider the array

    A:  2 3 5 7 9 10 13 18 24 27 33 35 37

    Review the steps in the above code for this array segment with k-1 = 7 and with item having the value 17. Explain what happens and why.

  2. Repeat step 1 for the original array segment, but inserting 40. Again, explain what happens.

  3. Now repeat step 1 for the original array segment, but inserting 2. And again explain what happens when the code is executed.

  4. Why does the above code contain the test (i >= 0)?

Using this basic insertion step, an array A can be sorted iteratively. This outline gives rise the the following code, called an insertion sort.


public static void insertionSort (int [] a) {
// method to sort using the insertion sort
   for (int k = 1; k < a.length; k++) {
      int item = a[k];
      int i = k-1;
      while ((i >= 0) && a[i] > item){
         a[i+1] = a[i];
         i--;
      }
      a[i+1] = item;
   }
}
  1. Write a few sentences explaining how this code works.

  2. Program ~walker/java/examples/sorting/SortShell1.java provides a shell for testing the insertion sort. Copy this file to your account. Review SortShell1.java, and note that it asks you to input several values into an array, so the array can be used to test the insertion sort. Compile and run this program with several sets of test data.

  3. Modify insertionSort, so that the array is sorted in descending order rather than ascending order.

Quicksort: An Example of Divide-and-Conquer Algorithms

The Quicksort, originally devised by C. A. R. Hoare [Computing Journal, Volume 5 (1962), pages 10-15], proceeds by applying an initial step that partitions the array into two pieces. The algorithm then is applied recursively to each of the two pieces. This approach of dividing processing into pieces and applying recursion to the two pieces turns out to be extremely general and powerful. Such an approach is called a divide and conquer algorithm.

The following code implements this basic step:


int left=first+1;
int right=last;
int temp;

while (right >= left) {
    // search left to find small array item
    while ((right >= left) && (a[first] <= a[right]))
        right--;
    // search right to find large array item
    while ((right >= left) && (a[first] >= a[left]))
        left++;
    // swap large left item and small right item, if needed
    if (right > left) {
        temp = a[left];
        a[left] = a[right];
        a[right] = temp;
    }
}
// put a[first] in its place
temp = a[first];
a[first] = a[right];
a[right] = temp;

Additional details are given in the reading for this lab.

  1. Review the steps in the above code for the following array segments. Check that the position of the array elements and index variables match the diagram exactly, and explain how the variables are set.
    A:  2 3 5 7 9 10 13 18 24 27 33 35 37

  2. Repeat the above for an array segment in which all elements are already in ascending order. Again, explain what happens.

  3. Next, repeat the steps for the above code for an array segment in which all elements are in descending order. Again, explain what happens.

  4. The code swaps a[left] and a[right] only if right > left. Give an example to show how the code could fail if the swap were done without the test for right > left.

  5. Explain why each loop contains the condition right >= left. Given an example where the code would fail if each right >= left test were omitted.

Given the above code to place the first element of an array segment appropriately and rearrange small and large items, the full array may be sorted by applying the algorithm recursively to the first part and the last part of the array. The base case of the recursion arises if there are no further elements in an array segment to sort.

This gives rise the the following code, called a quicksort.


public static void quicksort (int [] a) {
// method to sort using the quicksort
    quicksortKernel (a, 0, a.length-1);
}

private static void quicksortKernel (int [] a, int first, int last) {
    int left=first+1;
    int right=last;
    int temp;

    while (right >= left) {
        // search left to find small array item
        while ((right >= left) && (a[first] <= a[right]))
            right--;
        // search right to find large array item
        while ((right >= left) && (a[first] >= a[left]))
            left++;
        // swap large left item and small right item, if needed
        if (right > left) {
            temp = a[left];
            a[left] = a[right];
            a[right] = temp;
        }
    }
    // put a[first] in its place
    temp = a[first];
    a[first] = a[right];
    a[right] = temp;

    // recursively apply algorithm to a[first]..a[right-1] 
    // and a[right+1]..a[last], provided these segments contain >= 2 items
    if (first < right-1)
        quicksortKernel (a, first, right-1);
    if (right+1 < last)
        quicksortKernel (a, right+1, last);   
}

This code illustrates the husk-and-kernel programming style which arose frequently earlier in CSC 153.

  1. Write a few sentences explaining how this code works. What does the husk procedure do? Why is the husk needed?

  2. How is the base case handled in this recursive algorithm?

  3. Program ~walker/java/examples/sorting/SortShell2.java provides a shell for testing the quicksort. Copy this file to your account, compile it, and run it with several sets of test data.

  4. Modify quicksort and/or quicksortKernel, so that the array is sorted in descending order rather than ascending order.

  5. Program ~walker/java/examples/sorting/SortShell3.java is the same as SortShell2.java, except that a few lines have been added at the start of quicksortKernel. Write a few sentences explaining what this new code accomplishes; what is the logical effect of this change? For future reference in this lab, we will call this new version of quicksort a refined quicksort.

Analysis and Timing

The quicksort is called a divide-and-conquer algorithm, because the first step normally divides the array into two pieces and the approach is applied recursively to each piece.

Suppose this code is applied an array containing n randomly ordered data. For the most part, we might expect that the quicksort's divide-and-conquer strategy will divide the array into two pieces, each of size n/2, after the first main step. Applying the algorithm to each half, in turn, will divide the array further -- roughly 4 pieces, each of size n/4. Continuing a third time, we would expect to get about 8 pieces, each of size n/8.

Applying this process i times, would would expect to get about 2i pieces, each of size n/2i.

This process continues until each array piece just has 1 element in it, so 1 = n/2i or 2i = n or i = log2 n. Thus, the total number of main steps for this algorithms should be about log2 n. For each main step, we must examine the various array elements to move the relevant first items of each array segment into their correct places, and this requires us to examine roughly n items.

Altogether, for random data, this suggests that the quicksort requires about log2 n main steps with n operations per main step. Combining these results, quicksort on random data has O(n log2 n).

  1. Identify circumstances under which the best case and the worst case analyses could occur. What is the order of each sorting algorithm in the best case and in the worst case? (While class discussion may help answer this question for the insertion sort, and the discussion provides an answer for the quicksort, you still need to tabulate those answers, and you will need to formulate the answer for the refined quicksort.)

  2. Suppose each algorithm requires t milliseconds to sort N items. Using the order analysis you just gave, how long would you expect the algorithm would take to support 2N items. Briefly explain your answer.

The Java programming language provides a mechanism to compute run times of various programs. Specifically, System.currentTimeMillis() returns the amount of time (in milliseconds) since January 1, 1970. This method allows us to use the computer to time various algorithms.

  1. Program ~walker/java/examples/sorting/SortShell4.java combines the code for the insertion sort, quicksort, and refined quicksort into a single testing shell. Copy this program to your account, compile it, and run it to observe its processing.

  2. Place the following declarations at the start of main in SortShell4.java
    
    long start_time;
    long end_time;
    
    (Here, long allows these variables to store larger integers than the typical int integer data type.) Next, place the following lines just before and after the call to each sorting algorithm in the body of main.
    
    start_time = System.currentTimeMillis();
    // the call insertionSort (c); goes here 
    end_time = System.currentTimeMillis();
    
    Finally, place the following print statements after the printing of the c, d, and e arrays.
    
    out.println ("start:   " + start_time);
    out.println ("end:     " + end_time);
    out.println ("elapsed: " + (end_time - start_time));
    
    Now, compile and run the program, and explain what happens.

In your test, the elapsed time likely was shown as 0, since the time to sort a dozen items is smaller than a millisecond -- the accuracy of the clock. To get measurements that better illustrate efficiency issues, we must ask the machine to sort a larger data set. Rather than typing many data elements into the program during execution, we could generate test data as follows:

  1. Run the above program several times with ascending, random, and descending data, and with data sets of size 1000, 2000, 4000, 8000, and 16000. (This may take awhile, so you might want to work with a partner and split up the data collection.)

    Tabulate your results in a table, such as the following:

    Data Size Ascending DataRandom DataDescending Data
    1000
    2000
    4000
    8000
    16000


  2. Summarize the results in the table in a few sentences, and relate the results obtained to the order analysis you did earlier.

  3. Both the quicksort of this lab and the insertion sort have the characteristic that processing times may vary dramatically. That is, they can be extremely efficient for some types of data, but their efficiency decreases dramatically for some data sets. Would you describe the refined quicksort as having this characteristic as well? Briefly explain.

Work to be turned in:


This document is available on the World Wide Web as

http://www.cs.grinnell.edu/~walker/courses/153.sp05/labs/lab-sorting.shtml

created 24 April 2001
last revised 28 April 2006
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.