CSC153 2004S, Class 27: "Fast" Sorting Algorithms Admin: * Questions on exam 2? * When you say "Don't bother to document," do we have to show testing? Yes. * When you say "Dbtd," do we have to write the six P's? No. * When you say "Dbtd," do we have to write a short paragraph? No. * When you say "Dbtd," do we have to write internal comments? Probably. * With the section one, can you have more than one question mark? Yes. * Read Experiments in Java, Chapter 1, for Monday. * Food and "Drink" to celebrate our last formal day of Scheme * Yes, we're finally doing fast sorting algorithms * Unless, of course, Saul and Erik find new and creative ways to distract me. Overview: * Consider how to apply the "divide and conquer" technique to sorting * Key ideas in merge sort. * Key ideas in Quicksort * Lab All the sorting algorithms we discussed are O(n^2) * Insertion sort - Repeatedly insert values into a "sorted" list * Selection sort - Repeatedly select largest/smallest * Bubble sort - Repeatedly swap elements (thereby bubbling the largest element tot he end) Can we do better? * Approach one: Try other techniques. Let's try the "divide and conquer" technique * Divide the list/vector in half (how?) * First half / second half * Sort each half * How? (Recursively) * Then do what? * Merge the two together by repeatedly comparing the top elements Running time: t(n) = * Split the vector in two parts: 1 + * Sorting the two halves: t(n/2) + t(n/2) * Merge: n Solve the recurrence relation t(x) = 1 + x + 2*t(x/2) t(n) = 1 + n + 2*t(n/2) t(n) = 1 + n + 2*(1 + n/2 + 2*t(n/4)) t(n) = 1 + n + 2 + n + 4*t(n/4) t(n) = 3 + 2n + 4*t(n/4) t(n) = 3 + 2n + 4*(1 + n/4 + 2*t(n/8)) t(n) = 3 + 2n + 4 + n + 8*t(n/8)) t(n) = 7 + 3n + 8*t(n/8)) Generalize t(n) = 2^k-1 + kn + 2^k*t(n/2^k) Let k = log_2n t(n) = 2^(log_2n)-1 + log_2n*n + 2^log_2n*t(n/2^log_2n) t(n) = n-1 + log_2n*n + n*t(n/n) t(n) = n-1 + nlog_2n + n*t(1) Assume t(1) = 1 t(n) is in O(nlogn) That is faster than O(n^2) Merge sort: The basic/hard idea is merging the two sorted lists Quicksort * Divide the list/vector in half (how?) * Small values and large values O(n) * Sort each half (recursively) 2*t(n/2) * Then do what? * Shove 'em back together O(1) "Expected" O(nlogn) behavior * Sometimes O(n^2), but not regularly Can we do better? O(nlogn) * Approach one: Try other techniques. Failed. * Prove that you can't. * You cannot do better than O(nlogn) for compare-and-swap sorting * Use Quicksort Other sorting algorithms that can be faster * Bucket sort: Set up "buckets" for different values, shove thigns into each bucket, sort each bucket using some simple sort O(n+m), where n is number of elements and m is number of buckets * Radix sort Arrange by rightmost bit, then next rightmost bit, then ... O(n*#ofbits)