CSC153, Class 19: Algorithm Analysis Summary: * Why prefer higher-order procedures * Comparing algorithms * Asymptotic analysis * Eliminating constants * How do you do it? * Other notes Notes: * Exam 1 due Friday. Any final questions? * How many of you have done this stuff before? * Warning! I do read plans once in a while. * No extra reading for Friday. Work on the exam. Read more of Stone. * Send me information about three books (with five adjecctives). * Another cool stats talk at noon. Free pizza! * Cool math talk today at 4:15. ---------------------------------------- Why some programmers prefer higher-order procedures "Compute the inner product of two vectors, a and b" The Mathematician "The sum of the products of the individual pairs" The C++ programmer sum = 0; for (int i = 0; i < A.length; ++i) sum += A[i]*B[i]; The Scheme programmer (insert + (map * A B)) or (apply + (map * A B)) or even (insert + (map (left-section apply *) (map list A B))) Why would one like the C one? * It's easier to convert to Math terms * Easier to estimate the running time Why would one like the Scheme one? * It's easier to convert to Math terms * Scheme is shorter. Don't have to worry about ++i or i++ * It's much easier to parallelize the Scheme version ---------------------------------------- We've just looked at different algorithms that solve the same problem. We often have many different algorithms to solve the same problem. Exponentiation: * Sam's technique: Divide and conquer when exponent is even\ * Da Ma: Repeated multiplication * Mathematician x^n = e^(n*ln(x)) How do we choose which one? * Correctness: Does it work on all valid inputs? Semi-correctness: Does it work on all inputs our program will deal with * Ease of implementation: + How quickly can I implement it? + How sure can I be that I got my implementation correct? + Length of code * How fast does it actually run? + Time efficiency * Ease of use + Less clueful programmers might call your procedure * Robustness: What does it do on incorrect inputs? * Memory efficiency * Generality: Can it also solve related problems? In practice, "How fast does it actually run" becomes the primary consideration It is difficult to analyze precisely how fast code will run (in almost any language) * "to analyze precisely" vs. "to precisely analyze" Consider a loop with an internal conditional (define largest-in-list (lambda (lst) (cond ((null? (cdr lst)) (car lst)) ((> (car lst) (largest-in-list (cdr lst))) (car lst)) (else (largest-in-list (cdr lst)))))) Different "basic" operations take different amounts of time Don't worry about the details; worry about the general pattern * Some algorithms always seem to take the same time * Some algorithms seem to take some constant times the number of values you're procesing * Some algorithms seem to take ... [Cool picture] * The comparative analysis usually ignores small inputs (because funky things happen with small inputs) + As the input gets large = "asymptotic" * The comparative analysis usually ignores constant multipliers Let's write some formal notation to help formalize these concepts. O(g(n)) "big O" Provides an upper bound for functions. Officially O(g(n)) is a *set* of functions Goal: f(n) is in O(g(n)) if and only if "for sufficiently big inputs, g(n) is at least as big as f(n), ignoring constant multipliers" f(n) in O(g(n)) if and only if there exist values n0 and d > 0 such that for all n > n0 f(n) <= d*g(n) Since constants don't matter, we ignore them in writing our typically running times * f(n) is in O(1) "constant time" + vector-ref * f(n) is in O(log_2(n)) "logarithmic time" + Divide and conquer exponent * f(n) is in O(n) "linear time" + (length lst) + Divide and conquer exponent + Dave's exponent What is the order of the "matheamtical exponent" * It depends on the cost of e^x and ln(x) When you find an upper bound, you want to find the smallest upper bound.