CSC151 2007S, Class 46: Introduction to Sorting Admin: * EC for * Some Pride Week activity, particularly the Saturday parade. * Thursday's convo. (Warning! Likely to be long.) * Thursday's Thursday Extra (Dr. Davis). * Are there final questions on the project? * Reading for Friday: Sorting. (Probably needs some work.) Overview: * The problem of sorting. * Writing sorting algorithms. * Examples: Insertion, selection, etc. * Formalizing the problem. ==Yesterday: Binary Search== * Searches through an ordered set of data * How: Divide and conquer * Look at the middle, and then look in one of the two halves (or stop) * In order to do binary search, the data must be organized * Ooh ... a new problem: Given unorganized data, how do we organize them? * Given a set of data (vector, list, file) AND A WAY TO COMPARE EACH PAIR OF VALUES, how do we put them in order from smallest to largest? How do we write a sorting algorithm, anyway? Strategies: * Find a similar problem we've solved, and adapt the solution * Pick some standard algorithm design strategy and see if it works * One: divide and conquer * Others: greed, dynamic programming (keep track of previous results), ... * Solve it by hand, and then try to express what you did computationally Three techniques we've developed: * Insertion sort: Separate our collection into two groups * Those who already in order * Those we haven't dealt with yet * Repeatedly * Grab one we haven't dealt with yet * Put it in the correct place in the ordered subset * Quick sort - Divide and conquer applied to sorting * Pick some random value * Divide the collection into those things smaller than that value and those things larger * Recurse on each half * Selection sort * Repeatedly select the smallest of the remaining values and put Detour: Does the computer really know how to compare things * Suppose each data entry has the form (last-name first-name longitude latitude age height) * I would compare east-coastedness with (lambda (person1 person2) (< (caddr person1) (caddr person2))) * Requires that acomputer knows a few basic comparisons * Numbers (<, <=, ..) * Strings (string<=? string-ci<=?, ...) Back to sorting: About how many "steps" do we spend in each algorithm? Assume worst case * Insertion sort Insert 1 value into the empty list one step Insert 1 value into the length-1 list one comparison, one swap Insert 1 value into the length-2 list two comparisons, two swaps Insert 1 value into the length-3 list three comparisons, three swaps ... Insert 1 value into the length-k list k comparisons, k swaps ... Insert 1 value into the length-(n-1) list n-1 comparisons, n-1 swaps # of comparisons is 1 + 2 + 3 + ... + (n-1) # of swaps is 1 + 2 + 3 + ... + (n-1) Suppose we had 100 elements to sort, this takes ... ??? comparisons ??? swaps Use Gauss's formula 1+ 2 + ... + m = m(m+1)/2 Substitute n-1 for m (n-1)(n-1+1)/2 = (n-1)*n/2 99*100/2 =~ 5000 comparisons, 5000 swaps What if we had 200 elements? 199*200/2 =~ 19900 comparisons, 19900 swaps Selection sort * Find the smallest thing * Put it at the front * Find the smallest of n, n comparisons Swap takes 1 step * Find the smallest of n-1, n-1 comparisons Swap takes 1 step * Find the smallest of 2 things, 2 comparisons Swap takes 1 step * Find the smallest of 1 thing, 1 comparison Swap takes 1 step * About the same number of comparisons * Only N swaps! Quicksort To sort N things N comparisons to break into two groups of N/2 For each group N/2 comparisons to break into two more groups of N/4 Aka, another N to break the two groups of N/2 into four groups of N/4 ANother N to break the four groups of N/4 into eight groups of N/8 ... Another N to break the 2^k groups of N/2^k into 2^(k+1) groups of N/2^(k+1)