CSC297.Java, Class 22 or so, Asymptotic Analysis Admin: * Sam has Alex's friends chair. He doesn't want it. * Welcome back. Overview: * Review of Yvonne's mostly-correct code. * Comparing algorithms * How computer scientists like to compare algorithms * The benefits of abstraction * Big-O notation Given two algorithms that solve the same problem (e.g., insertion sort and merge sort), how do you decide which one to use? * Speed: Compare the number of steps it takes, use the faster one. * Memory usage * Complexity of implementation: Use the one you're sure you can implement correctly. * Ease of maintenance. * Ease of modification. Observation: Looking at speed is surprisingly hard * The same algorithm may behave quite differently on different data * Example: Insertion sort: To sort a list, start with an empty list called "sorted" and a full list called "unsorted". Repeatedly grab an element from unsorted and put it in the correct place in sorted. * On average, it takes about "# of elements in sorted"/2 to find the correct place 0/2 + 1/2 + 2/2 + ... n-1/2 + n/2 = (0 + 1 + 2 + ... + n-1 + n)/2 = n(n+1)/4 * In the best case, it takes only one step to find the correct place 1 + 1 + 1 + ... + 1 = n * In the worst case, it takes "# of elements in sorted" steps to find the correct place (it's after this one, it's after this one, it's after this one ... whoops, we've reached the end) 0 + 1 + 2 + ... n-1 + n = n(n+1)/2 * There are lots of "steps" that are not necessarily equal in terms of computing time. (E.g., adding 1 is much faster than getting the cdr.) * Algorithms sometimes behave differently on small and large inputs. * Computer scientists therefore look to ways to simplify the problem. * Normal strategy: Asymptotic analysis * Ignore constant multipliers * Look for the behavior of the function "with sufficiently big inputs" * Goal: To find a function that bounds the running time of our algorithm * Improved goal: To find a function that closely bounds the running time of our algorithm * Formal notation * O(f(n)) is the set of functions that f(n) bounds above for sufficiently large input. * g(n) is in O(f(n)) iff exist c,n_0 > 0 s.t. for all n > n_0 |f(n)| >= |c*g(n)| * n_0 is "for sufficiently large input" * c is "constants multipliers don't matter" * Suppose h(n) = n^2 and j(n) = 5*n^2 * h(n) is in O(j(n)) * Proof: let n_0 be 0, let c be 1 (or anything 0 < c <= 5) c*h(n) <= c*n^2 <= 5*n^2 <= j(n) * j(n) is in O(h(n)) * Proof: let n_0 be 0, let c be <= 1/5 c*j(n) <= c*5*n^2 <= 1/5*5*n^2 <= n^2 <= h(n) * Suppose h(n) = n^2 and j(n) = 1000n * Is h(n) in O(j(n))? Find c, n_0 such that c*h(n) <= j(n) for n > n_0 c*n^2 <= 1000*n for n > n_0 * Suppose such values existed c*n^2 <= 1000*n for n > n_0 Consider n_1 = 1001/c (if n_1 < n_0, choose something even larger) c*(1001/c)*(1001/c) vs. 1000*1001/c The first one is bigger. Whoops. That violates our assumption that those values exist * Is j(n) in O(h(n))? Yes. Let c = 1/1000 and n_0 be 1 1/1000*1000*n <= n^2 for all n > n_0