CSC153 2004S, Class 22: Searching Admin: * Exams not returned due to chaos at Rebelsky household. * Pseudo-convo today. * Homework: Read "Sorting" (online). * Optional: Read chapter 6 of Stone. Overview; * Algorithms for common problems. * A key problem: Searching. * Sequential Search. * Binary Search. * Q and A. * Lab "Doing" Computer Science * Computer scientists look for problems for which they can write algorithms * They almost immediately generalize any problem they find so that their solutions can be applicable in multiple situations One of the biggest problems that people deal with when handling lots of information is "searching". * Given a collection of stuff, find something that matches some criteria. * Sequential search: Look at each value in turn until you find one that matches the criteria * In order to do sequential search, you need a way to "look at each value in turn" * That ability is common to many, but not all, data structures * Lists: car is the current, cdr helps you move on to the next * Vectors: Keep a counter * Trees (to be revisited): A little harder * In a collection of n values, how long does it take to find a matching value (or to note that you fail to match)? O(n) Computer scientists always look for ways to improve algorithms * Pure improvement - No change to the problem (e.g., speeding up exponentiation with the cool divide-and-conquer strategy) * Impure improvement - Change the problem to make it easier to solve (e.g., approximating exponentiation using ln and exp tables) For searching, we do an "impure improvement": * If the values are in "order", we can do better Doing better: Binary search (divide and conquer technique) * Principle idea: * Look at the middle element * Matches -> Done * Too small, throw away the small half * Too large, throw away the large half * Running time: O(log_2(n)) * Analysis (1): Each time, the input is half as big. When we have one value left, we're done. "The number of times we have to split n in half in order to reach 1". O(log_2(n)) * Analysis (2): * time(n) = c + time(n/2) * See yesterday's notes * This grows *much less quickly* than O(n) * Why do we say "log base 2 of n" rather than "natural log of n"? Habit. More closely matches what's happening in the problem. * What characteristics do we need of the input other than that it is "in order from smallest to largest"? * Can we binary search lists? Why or why not? It is expensive to find the middle element in a list. (O(n) rather than constant.) * Can we binary search vectors? Finding the middle element is easy. (vector-ref vec (quotient (vector-length vec) 2)) * How do we throw away half the vector in constant time? We don't really throw it away, we keep track of the lower and upper bounds of the region of interest. * What role does get-key play in all of this? * Since the values can be ordered in different ways, and the key we're searching for is only part of each value, the get-key pulls out "the appropriate part" of the value. LAB! Why does the following happen? > (binary-search "Batman" cartoons car string (binary-search "Batman" cartoons car string<=?) -1 Sam's an idiot: * Input with (read) * Output with (display val) FORUM SOUTH LOUNGE!