CSC153 2004S, Class 21: Algorithm Analysis (2) Admin: * HW4 due. Make sure your name is at the top! * Exam 1 to be returned tomorrow (I hope). * Exam 2 to be distributed next week (probably Wednesday). * Read "Searching" * Today's outline may have extra stuff (particularly stuff we did Monday). * Distributed: Mr. Stone's book. (Optional reading.) * Cool panel tomorrow with FREE LUNCH! Overview: * Review: Big-O notation and its benefits. * An example: Comparing two running times. * Analyzing recursive times * Recurrence relations * Samples /Big-O Notation/ * To Mathematicians: A way to compare functions * To Computer Scientists: A way to approximate running times * Approximate definition: g(n) is in O(f(n)) iff there exist n0, c>0 s.t. for all n > n0 |c*g(n)| <= |f(n)| * "n0" : For big enough input * "c" : We don't care about constant multipliers, particularly since they differ from machine to machine * "absolute value": Mathematicians like to talk about functions that may return negative numbers /Why Big-O Notation May Be Useful/ * Experimental results: * Algorithm A: * Input size 100: 1 second * Input size 200: 4 seconds * Input size 300: 9 seconds * Algorithm B: * Input size 100: 100 seconds * Input size 200: 200 seconds * Input size 300: 300 seconds * Which is faster for really large inputs? Suppose that careful analysis of the code reveals that * Algorithm A's running time is approximately (n/100)^2 * Algorithm B's running time is approximately n * (There are other results): A could also be linear with experimental error * As Erik notes, for a particularly big input, B will be faster. Claim: 10,000 is the turning point * (10,000/100)^2 = 100^2 = 10,0000 * For bigger inputs, we see gains * (20,000/100)^2 = 200^2 = 40,000 vs. 20,000 for B * Conclusion: The constant multiplier really doesn't matter. The shape of the curve matters. * Therefore, we can analyze primarily in terms of the shape of the curve. n is in O((n/100)^2) (n/100)^2 is not in O(n) For iterative algorithms: Analysis is comparatively easy * Count the number of individual steps * Count the number of repetitions * Multiply the two * Throw away constant multipliers Sometimes this feels "wrong" (overly pessimistic) procedure selection-sort(array-of-ints numbers) for i = 0 to numbers.length swap(numbers[i], indexOfSmallest(numbers, i) procedure indexOfSmallest(array-of-ints numbers, int start) guess = start; for j = start + 1 to numbers.length if numbers[j] > numbers[guess] guess = j; return guess; Claim: n repetitions of indexOfSmallest, each of which costs n O(n^2) But ... the second time you call indexOfSmallest n-1 elements the third time n-2 the fourth time n-3 after awhile 5, 4, 3, 2, 1 Alternate analysis n + n-1 + n-2 + n-3 + ... 3 + 2 + 1 = n(n+1)/2 The story of Gauss K: n + n-1 + n-2 + n-3 + ... 3 + 2 + 1 K: 1 + 2 + 3 + 4 + n-2 + n-2 + n -------------------------------------------- 2*K = n+1 + n+1 + n+1 .... + n+1 (n such sums) 2*K = n*(n+1) K = n*(n+1)/2 n*(n+1)/2 is in O(n^2) /But what about recursive procedures?/ * Sometimes: Count the number of recursive calls; Count the cost per recursive call; Multiply! Throw away constants. * Fibonacci (define (fib n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2))))) * How many recursive calls do you do? (It's surprisingly hard to count) (Slightly less than 2^n) Why is the Fibonacci sequence useful? * Limit of nth/n-1st approximates the golden ratio 1+sqrt(3)/2 Observation: You can approximate the nth Fibonacci number by computing (1+sqrt(3)/2)^n /Alternate Strategy: Define Functions Recursively and Analyze/ time(n) = c + time(n-1) * Goal: Eliminate the recursion time(n) = c + c + time(n-2) time(n) = c + c + c + time(n-3) time(n) = k*c + time(n-k) Assume the base case takes 0 steps When n = k, time(n) = n*c in O(n) /Sample Recurrence Relations and Their Solutions/ * time(n) = n + time(n-1) time(n) = n + n-1 + time(n-2) time(n) = n + n-1 + n-2 + time(n-3) time(n) = Gauss = n*(n-1)/2 time(n) = c + 2*time(n/2) time(n) = c + 2c + 2*time(n/4) time(n) = c + 2c + 4c + 2*time(n/8) time(n) = c + 2c + 4c + 8c + 2*time(n/16) time(n) = c(1 + 2 + ... + 2^(k-1)) + 2*time(n/2^k) When k = log_2n Assume time(1) = 0 time(n) = c*(1 + 2 + 4 + ... 2^k) = c*(2^(k+1)-1) is in O(n) time(n) = c + time(n/2) time(n) = c + c + time(n/4) time(n) = c + c + c + time(n/8) time(n) = k*c + time(n/2^k) when k = log_n time(n) = c*log_2(n) + time(1) in O(log_2(n)) time(n) is in O(log_2(n)) * time(n) = c n + time(n-1) * time(n) = c * time(n-1) * time(n) = c * time(n/2)