CSC151.02 2003F, Class 43: Searching Admin: * Today is mostly lecture/discussion * Read "Searching" * Yes, it was intentional that you read after discussing * Work on writeups for Tail Recursion (due Thursday) * Bring questions on Tuesday Overview: * Problems and Algorithms: The Core of CS * Two Key Problems * Searching * Sorting * Searching in Sorted Lists * Binary Search Where do the problems that computer scientists "solve" come from? * If you develop problems yourself, no one may care about the solution. * Hence, computer scientists look to other disciplines and activities for problems * Computer scientists often "generalize" problems * Two core problems many people face: Searching and Sorting * Searching: Given a collection of things, find a particular thing * First version * I want to find Luis's telephone number * Find a book in a library (given its call number) * Find the definition (or translation) of a word in a dictionary * Second version * Find a procedure that is tail recursive in a list of procedures we've written * Find a student who will answer my question * Find whose phone number is 269-3450 * Why are the first three problems different from the last two? * First three have a specific key; last two have a particular quality * First three have "assistive technologies"; last ones don't * First three have a collection that is put "in order" by the thing I'm using to search * So we have two different kinds of searching we might do: In collections ordered in a way useful to us and in all other collections * Sorting: Given a collection of things, put them in "order" * Usually to support searching * Sometimes to support reporting * Today and tomorrow: Searching; * Rest of week: Sorting * Start with searching in unordered collections * We can represent the collection as a list * We can represent the collection as a vector * We can represent the collection as a tree * Searching in a list * Use assoc or one of the dozens of variants we wrote * Strategy: * Look at first thing (car) * If it matches, return it * O/w, recurse on the cdr * Searching in a vector * Look at one value * If it matches, return it * Otherwise, look at the next and the next and the next * Need some form of recursion Detour: What's a vector? * A vector is a data structure that is both similar to and dissimilar from a list * Similar: Stores a lot of information * Dissimilar: + Fixed length + Fast access to the ith element (where i is an integer) + Individual elements are mutable * To recurse over them, we keep track of our position and change it at every step (define PROC (lambda (vec pos) (if (>= pos (vector-length vec)) BASE-CASE (COMBINE (vector-ref vec pos) (PROC vec (+ pos 1)))))) How would we apply this in the case of searching? *See sample code Now lets return to the case of a collection sorted from smallest to largest * Look in the middle * If the middle element is what you're looking for, you're done * If the middle element is too small, * Recurse on the right * If the middle element is too big * Recurse on the left This kind of search is called "binary search"; It seems to be faster than the previous searching strategy * Suppose we had about 1000 things in our phone book About one step to find the middle element Assume one step to throw away half (hah hah) * Now we have to search 500 things * Two steps: 250 * Two steps: 125 * Two steps: 62 * Two steps: 31 * Two steps: 16 * Two steps: 8 * Two steps: 4 * Two steps: 2 * Two steps: 1 In binary search, Every time you double the collection, you add two more steps In the unordered search, Every time you double the collection, you approximately double the number of steps Binary search (at least as we've described it) takes about 2*log_2(n) steps to search a collection of n elements.