To search a data structure is to examine its elements one-by-one
until either (a) an element that has a desired property is found or (b)
it can be concluded that the structure contains no element with that
property. For instance, one might search a vector of integers for an even
element, or a vector of pairs for a pair having the string
as its cdr.
You’ve already encountered a number of forms of searching in Scheme. For
example, you’ve searched lists using
assoc. You’ve also written more
general procedures that find multiple elements with particular properties
or that find elements based on more than just the car of an element list.
We’re now reading to think about a more general form of searching, one in which we specify the criterion for searching as a procedure value, rather than hard-coding the particular criterion in the structure of the search.
In a linear data structure – such as a flat list, a vector, or a file – there is an obvious algorithm for conducting a search: Start at the beginning of the data structure and traverse it, testing each element. Eventually one will either find an element that has the desired property or reach the end of the structure without finding such an element, thus conclusively proving that there is no such element. We used such a strategy for searching association lists. Here are a few alternate versions of the algorithm.
;;; Procedure: ;;; list-sequential-search ;;; Parameters: ;;; lst, a list ;;; pred?, a unary predicate ;;; Purpose: ;;; Searches the list for a value that matches the predicate. ;;; Produces: ;;; match, a value ;;; Preconditions: ;;; pred? can be applied to all values in lst. ;;; Postconditions: ;;; If lst contains an element for which pred? holds, match ;;; is one such value. ;;; If lst contains no elements for which pred? holds, match ;;; is false (#f). (define list-sequential-search (lambda (lst pred?) (cond ; If the list is empty, no values match the predicate. [(null? lst) #f] ; If the predicate holds on the first value, use that one. [(pred? (car lst)) (car lst)] ; Otherwise, look at the rest of the list [else (list-sequential-search (cdr lst) pred?)]))) ;;; Procedure: ;;; vector-sequential-search ;;; Parameters: ;;; vec, a vector ;;; pred?, predicate ;;; Purpose: ;;; Searches the vector for a value that matches the predicate. ;;; Produces: ;;; match, a value ;;; Preconditions: ;;; pred? can be applied to all elements of vec. ;;; Postconditions: ;;; If vec contains an element for which pred? holds, match ;;; is the index of one such value. That is, ;;; (pred? (vector-ref vec match)) holds. ;;; If vec contains no elements for which pred? holds, match ;;; is false (#f). (define vector-sequential-search (lambda (vec pred?) ; Grab the length of the vector so that we don't have to ; keep recomputing it. (let ([len (vector-length vec)]) ; kernel: Keeps track of the position we're looking at. (let kernel ([position 0]) ; Start at position 0 (cond ; If we've run out of elements, give up. [(= position len) #f] ; If the current element matches, use it. [(pred? (vector-ref vec position)) position] ; Otherwise, look in the rest of the vector. [else (kernel (+ position 1))])))))
> (define sample (vector 1 3 5 7 8 11 13)) > (vector-sequential-search sample even?) 4 ; The position of 8 > (vector-sequential-search sample (right-section = 12)) #f > (vector-sequential-search sample (left-section < 9)) 5 ; The position of 11
These search procedures return
#f if the search is unsuccessful. The
first returns the matched value if the search is successful. The second
returns returns the position in the specified vector at which the desired
element can be found. There are many variants of this idea: One might,
for instance, prefer to signal an error or display a diagnostic message if
a search is unsuccessful. In the case of a successful search, one might
#t, if all that is needed is an indication of whether an
element having the desired property is present in or absent from the list.
One of the most common “real-world” searching problems is that of searching a collection of compound values for one which matches a particular portion of the value, known as the key. For example, we might search a phone book for a phone number using a person’s name as the key or we might search a phone book for a person using the number as key. As you’ve probably noted, association lists implement this kind of searching, as long as we use the first value of a list as the key for that list.
If each value in the list or vector to search is a list, and the key is
the first element of that list, and we are searching for strict equality,
then we can use
assoc to search the list. However, if we might want to
search using the second element as the key, or a combination of elements
as the key, then we might want to make a
get-key procedure a parameter
to our search procedure.
;;; Procedure: ;;; keyed-list-sequential-search ;;; Parameters: ;;; values, a list of compound values. ;;; get-key, a procedure that extracts a key from a compound value. ;;; key, a key to search for. ;;; Purpose: ;;; Finds a member of the list that has a matching key. ;;; Produces: ;;; match, a Scheme value ;;; #f, otherwise. ;;; Preconditions: ;;; The get-key procedure can be applied to each element of values. ;;; Postconditions: ;;; If there is no index for which ;;; (equal? key (get-key (list-ref values index))) (define keyed-list-sequential-search (lambda (values get-key key) (list-sequential-search values (lambda (val) (equal? key (get-key val))))))
For example, consider the directory from the lab on association
lists. As you may recall, each entry has
a last name, a first name, and some other values. To search by first
name, we would use
get-key. To search by the combination
of first and last name, we would most likely need a more complex
(lambda (entry) (string-append (cadr entry) " " (car entry)))
; Search the list for someone whose last name is "Smith" > (keyed-list-sequential-search grinnell-directory-annotated car "Smith") ; Search the list for someone whose first name is "John" > (keyed-list-sequential-search grinnell-directory-annotated cadr "John") ; Search the list for someone whose name is "John Smith" > (keyed-list-sequential-search grinnell-directory-annotated (lambda (entry) (string-append (cadr entry) " " (car entry))) "John Smith")
The sequential search algorithms just described can be quite slow if the data structure to be searched is large. If one has a number of searches to carry out in the same data structure, it is often more efficient to “pre-process” the values, sorting them and transferring them to a vector, before starting those searches. The reason is that one can then use the much faster binary search algorithm.
Binary search is a more specialized algorithm than sequential search. It requires a random-access structure, such as a vector, as opposed to one that offers only sequential access, such as a list. Binary search is limited to the kind of test in which one is looking for a particular value that has a unique relative position in some ordering. For instance, one could use a binary search to look for an element equal to 12 in a vector of integers ordered from smallest to largest, since 12 is uniquely located between integers less than 12 and integers greater than 12; but one wouldn’t use binary search to look for an even integer, since the even integers don’t have a unique position in any natural ordering of the integers.
Note that this means that we have to organize the vector based on the kind of value we want to search for. If we want to search a directory by last name, we need it alphabetized by name. If we want to search it by phone number, we organize it by phone number.
In binary search, we keep track of the vector, the value searched for, and the lower and upper bounds of the region still of interest. The key idea is to divide the region of interest of the sorted vector into two approximately equal parts, examining the element at the point of division to determine which of the parts must contain the value sought.
There are usually three possibilities for the relationship between the value sought and the element at the point of division.
There is one other way in which the recursion can terminate: If, in some recursive call, the region to be searched contains no elements at all, then the search obviously cannot succeed and the procedure should take the appropriate failure action.
Here, then, is the basic binary-search algorithm. The identifiers
upper-bound denote the starting and ending positions
of the region of the vector within which the value sought must lie,
if it is present at all. (We use the convention that the starting and
ending positions are inclusive in that they are positions within the
vector that we must include in the search.)
;;; Procedure: ;;; binary-search ;;; Parameters: ;;; vec, a vector to search ;;; get-key, a procedure of one parameter that, given a data item, ;;; returns the key of a data item ;;; may-precede?, a binary predicate that tells us whether or not ;;; one key may precede another ;;; key, a key we're looking for ;;; Purpose: ;;; Search vec for a value whose key matches key. ;;; Produces: ;;; match, a number. ;;; Preconditions: ;;; * The vector is "sorted". That is, ;;; (may-precede? (get-key (vector-ref vec i)) ;;; (get-key (vector-ref vec (+ i 1)))) ;;; holds for all reasonable i. ;;; * The get-key procedure can be applied to all values in the vector. ;;; * The may-precede? procedure can be applied to all pairs of keys ;;; in the vector (and to the supplied key). ;;; * The may-precede? procedure is transitive. That is, if ;;; (may-precede? a b) and (may-precede? b c) then it must ;;; be that (may-precede? a c). ;;; * If two values are equal, then each may precede the other. ;;; * Similarly, if two values may each precede the other, then ;;; the two values are equal. ;;; Postconditions: ;;; * If vector contains no element whose key matches key, match is -1. ;;; * If vec contains an element whose key equals key, match is the ;;; index of one such value. That is, key is ;;; (get-key (vector-ref vec match)) (define binary-search (lambda (vec get-key may-precede? key) ; Search a portion of the vector from lower-bound to upper-bound (let search-portion ([lower-bound 0] [upper-bound (- (vector-length vec) 1)]) ; If the portion is empty (if (> lower-bound upper-bound) ; Indicate the value cannot be found -1 ; Otherwise, identify the middle point, the element at that ; point and the key of that element. (let* ([point-of-division (quotient (+ lower-bound upper-bound) 2)] [separating-element (vector-ref vec point-of-division)] [sep-elt-key (get-key separating-element)] [left? (may-precede? key sep-elt-key)] [right? (may-precede? sep-elt-key key)]) (cond ; If the middle key equals the value, we use the middle value. [(and left? right?) point-of-division] ; If the middle key is too large, look in the left half ; of the region. [left? (search-portion lower-bound (- point-of-division 1))] ; Otherwise, the middle key must be too small, so look ; in the right half of the region. [else (search-portion (+ point-of-division 1) upper-bound)]))))))
So, how do we use binary search to search a sorted vector? It depends on what the vector contains. Let’s suppose each entry is a set of information on students, sorted by first name. Here is one such vector.
(define simulated-students (vector (list "Amy" "Zevon" 1336804 "Computer Science" 2019) (list "Bob" "Smith" 1170605 "Mathematics" 2020) (list "Charlotte" "Davis" 1304091 "Independent" 2018) (list "Danielle" "Jones" 1472662 "Undeclared" 2021) (list "Devon" "Smith" 1546921 "Computer Science" 2018) (list "Erin" "Anderson" 1320727 "Philosophy" 2019) (list "Fred" "Stone" 1260057 "Linguistics" 2018) (list "Greg" "Jones" 1668280 "Classics" 2020) (list "Heather" "Jones" 1046860 "Classics" 2021) (list "Ira" "Jackson" 1070103 "Political Science" 2022) (list "Janet" "Smith" 1488985 "Chemistry" 2019) (list "Karla" "Hill" 1821167 "Psychology" 2018) (list "Leo" "Levens" 1399810 "English" 2019) (list "Maria" "Moody" 1168059 "Computer Science" 2020) (list "Ned" "Black" 1177023 "Russian" 2018) (list "Otto" "White" 1908656 "Chinese" 2019) (list "Paula" "Hall" 1218704 "Psychology" 2018) (list "Quentin" "Smith" 1679081 "Art History" 2018) (list "Rebecca" "Davis" 1658200 "Biology" 2020) (list "Sam" "Sky" 1085519 "Mathematics" 2018) (list "Ted" "Tedly" 1480618 "GWSS" 2019) (list "Urkle" "Andersen" 1681805 "Anthropology" 2018) (list "Violet" "Teal" 1493989 "Economics" 2019) (list "Xerxes" "Homer" 1547425 "Economics" 2019) (list "Yvonne" "Stein" 1748611 "Sociology" 2018) (list "Zed" "Rebel" 1540899 "Computer Science" 2017) ))
As you may recall,
binary-search has four parameters: a vector to
search, the procedure that extracts a key from each element in the vector,
the procedure used to compare keys, and a key to search for. For this
example, the vector to search will be
simulated-students and the name to
search for will be whatever name we want. To get the name from an entry,
car. To compare two names, we use
So, to find out the index of the entry for the person whose first name is “Heather”, we would write something like the following:
> (binary-search simulated-students car string-ci<=? "Heather") 8 > (vector-ref simulated-students 8) '("Heather" "Jones" 1046860 "Classics" 2021)
To make it easier for people who don’t want to write so much, we might wrap that instruction into a more-specific procedure that looks up people by first name and returns entries, rather than indicies.
;;; Procedure: ;;; lookup-by-first-name ;;; Parameters: ;;; directory, a list of student info entries ;;; first-name, a string ;;; Purpose: ;;; Find the entry associated with name. ;;; Produces: ;;; entry, a student info entry ;;; Preconditions: ;;; * Each element of directory is a student info entry (a list of ;;; first name, last name, student id, major, and graduation year). ;;; * directory is arranged alphabetically by name, from alphabetically ;;; first to alphabetically last. That is, ;;; (string-ci<=? (car (vector-ref directory i)) ;;; (car (vector-ref directory (+ i 1)))) ;;; for all reasonable i. ;;; * No two entries have the same name. ;;; Postconditions: ;;; If there is an i s.t. (car (vector-ref directory i)) is first-name, then ;;; entry is (vector-ref directory i) ;;; Otherwise, entry is #f (define lookup-by-first-name (lambda (directory first-name) (let ([index (binary-search directory car string-ci<=? first-name)]) (if (= index -1) #f (vector-ref directory index)))))
Let’s watch it work.
> (lookup-by-first-name simulated-students "Heather") '("Heather" "Jones" 1046860 "Classics" 2021) > (lookup-by-first-name simulated-students "Sam") '("Sam" "Sky" 1085519 "Mathematics" 2018) > (lookup-by-first-name simulated-students "Jon") #f
Let’s take a detour into a traditional mathematical problem: Given a number, n, how do you decide if n is prime? As you might expect, there are a number of ways to determine whether or not a value is prime. Since we know a lot of primes, for small primes the easiest technique is to search through a vector of known primes.
;;; Value: ;;; small-primes ;;; Type: ;;; vector of integers ;;; Contents: ;;; All of the prime numbers less than 1000, arranged in increasing order. (define small-primes (vector 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997))
We could, of course, use a sequential search technique to look for a
value in this vector. However, binary search is much more efficient. What
procedure should we use for
get-key? Well, each value is its own key,
so we use
(lambda (x) x). The values are ordered numerically, so we use
; Is 231 a prime? > (binary-search small-primes (lambda (x) x) <= 231) -1 ; No ; Is 241 a prime? > (binary-search small-primes (lambda (x) x) <= 241) 52 ; Yes, it's prime number 52 ; How many primes are there less than 1000? > (vector-length small-primes) 168
In procedure form, we might write
(define is-small-prime? (lambda (candidate) (not (= -1 (binary-search small-primes (lambda (x) x) <= candidate)))))
Now, how many recursive calls do we do in determining whether or not a candidate value is a small prime? If we were doing a sequential search, we’d need to look at all 168 primes less than 1000, so approximately 168 recursive calls would be necessary. In binary search, we split the 168 into two groups of approximately 84 (one step), split one of those groups of 84 into two groups of 42 (another step), split one of those groups into two groups of 21 (another step), split one of those groups of 21 into two groups of 20 (we’ll assume that we don’t find the value), split 10 into 5, 5 into 2, 2 into 1, and then either find it or don’t. That’s only about six recursive calls. Much better than the 168.
Now, suppose we listed another 168 or so primes. In sequential search, we would now have to do 336 recursive calls. With binary search, we’d only have to do one more recursive call (splitting the 336 or so primes into two groups of 168).
This slow growth in the number of recursive calls (that is, when you double the number of elements to search, you double the number of recursive calls in sequential search, but only add one to the number of recursive calls in binary search) is one of the reasons that computer scientists love binary search.
binary-search to work correctly, we need to have a sorted
vector. Checking that a vector is sorted will require looking at every
neighboring pair of values, so it is not something we want to do every
time we call binary search. However, it is helpful to have such a
;;; Procedure: ;;; vector-sorted? ;;; Parameters: ;;; vec, a vector ;;; get-key, a procedure that extracts keys from the elements of vec ;;; may-precede?, a procedure that compares keys ;;; Purpose: ;;; Determine if vec is sorted by key ;;; Produces: ;;; is-sorted?, a Boolean ;;; Preconditions: ;;; get-key should be applicable to any value in vec. ;;; may-precede? should be applicable to any two values returned by get-key. ;;; Postconditions: ;;; If, for all reasonable i, ;;; (may-precede? (get-key (vector-ref vec i)) ;;; (get-key (vector-ref vec (+ i 1)))) ;;; then is-sorted is #t. ;;; Otherwise, ;;; is-sorted is #f. (define vector-sorted? (lambda (vec get-key may-precede?) (let ([veclen (vector-length vec)]) (letrec ([kernel (lambda (i) (or (= i (- veclen 1)) (and (may-precede? (get-key (vector-ref vec i)) (get-key (vector-ref vec (+ i 1)))) (kernel (+ i 1)))))]) (kernel 0)))))
Here are some tests for the vectors we defined earlier.
> (vector-sorted? small-primes id <) #t > (vector-sorted? simulated-students car string-ci<=?) #t > (vector-sorted? simulated-students cadr string-ci<=?) #f
These checks might take you a little bit longer, but they’re not complex. However, they are importantly designed to help you understand searching before starting the lab, so please make your best effort to complete them.
a. Explain the role of the
pred? parameter in
b. Explain the role of the
get-key parameter in
c. Explain how these parameters work together to implement a keyed sequential search.
d. If we double the length of the list, what is the worst case effect
on the number of recursive calls in
a. Explain the role of the
b. Explain the role sf
right?, which are bound in the
c. Describe how and shy the
upper-bound of kernel
left? is true.
d. Describe how and shy the
lower-bound of kernel
right? is true.
e. If we double the length of the vector, what is the worst case effect
on the number of recursive calls in
a. Make a copy of binary-search-lab.rkt.
b. Verify that binary search can correctly find the entry for
simulated-students. That is, write an expression to find
> (binary-search simulated-students ___ ___ "Heather")
Hint: If you can’t fill in the blanks yourself, the example is in the reading. We are asking you to run the search yourself.
c. Verify that binary search can correctly find the entry for a student
of your choice in
d. Verify that binary search can correctly find the first entry in
simulated-students. You will need to supply the name associated with
e. Verify that binary search can correctly find the last entry in
simulated-students. You will need to supply the name associated with
f. Verify that binary search terminates and returns -1 for something that would fall in the middle of the vector and is not there. That is, pick a name that starts with M or N and that does not appear in the vector.
g. Verify that binary search terminates and returns -1 for something that
comes before the first entry in
simulated-students. You will need to
pick a name that alphabetically precedes
h. Verify that binary search terminates and returns -1 for something
that comes after the last entry. You will need to pick a name that
i. Why do you think we had you verify each of these conditions?