To search a data structure is to examine its elements singly
until one has either found an element that has a desired property or
concluded that the data structure contains no such element. For instance,
one might search a vector of integers for an even element, or an
vector of pairs for a pair having the string "elephant" as
its cdr. Scheme's predefined assq, assv, and
assoc procedures are (somewhat specialized) search procedures.
In a linear data structure like a flat list, vector, or file, there is an obvious algorithm for conducting a search: Start at the beginning of the data structure and traverse it, testing each element. Eventually one will either find an element that has the desired property or reach the end of the structure without finding such an element, thus conclusively proving that there is no such element. Here's a vector version of the algorithm:
(define linear-search
(lambda (test? vec)
(let ((len (vector-length vec)))
(let loop ((position 0))
(cond ((= position len) #f)
((test? (vector-ref vec position)) position)
(else (loop (+ position 1))))))))
> (define sample (vector 1 3 5 7 8 11 13))
> (linear-search even? sample)
4
> (linear-search (lambda (elm) (= elm 12)) sample)
#f
This search procedure returns #f if the search is
unsuccessful; if it is successful, it returns the position in the specified
vector at which the desired element can be found. There are many variants
of this idea: One might, for instance, prefer to signal an error or display
a diagnostic message if a search is unsuccessful. Similarly, in the case
of a successful search, one might simply return #t (if all
that is needed is an indication of whether an element having the desired
property is present in or absent from the list), or one might return the
element found rather than its position in the vector.
Define a Scheme procedure that searches a list for an element that meets
a specified test, returning that element if it is successful and returning
the symbol nonesuch if it is unsuccessful.
Define a Scheme procedure that reads in Scheme values from a given input
port, applying a specified test to each one. When it finds a value that
passes the test, it should return that value; if it gets the end-of-file
object before finding a value that passes the test, it should call the
error procedure to print an appropriate diagnostic.
The linear search algorithms just described can be quite slow if the data structure to be searched is large, and if one has a number of searches to carry out in the same data structure it is often more efficient to ``preprocess'' the values, sorting them and transferring them to a vector, before starting those searches. The reason is that one can then use the much faster binary search algorithm.
Binary search is a more specialized algorithm than linear search: It requires a random-access structure as opposed to a sequential one, and it is limited to the kind of test in which one is looking for a particular value that has a unique relative position in some ordering. For instance, one could use a binary search to look for an element equal to 12 in a vector of integers, since 12 is uniquely located between integers less than 12 and integers greater than 12; but one wouldn't use binary search to look for an even integer, since the even integers don't have a unique position in any natural ordering of the integers.
The idea in a binary search is to divide the sorted vector into two approximately equal parts, examining the element at the point of division to determine which of the parts must contain the value sought. Actually, there are usually three possibilities:
The element at the point of division precedes the value sought in the ordering that was used to sort the vector. In this case, the value sought must be in a position with a higher index that the element at the point of division -- it must be in the right half of the vector, if it is present at all. The search procedure invokes itself recursively to search just the right half of the vector.
The value sought precedes the element at the point of division. In this case, the value sought must be in a lower-indexed position -- in the left half of the vector -- if it is present at all. The search procedure invokes itself recursively to search just the left half of the vector.
The value sought is the element at the point of division. The search has succeeded.
Actually, there is one other way in which the recursion can bottom out: If, in some recursive call, the subvector to be searched (which will be half of a half of a half of ... of the original vector) contains no elements at all, then the search cannot succeed and the procedure should take the appropriate failure action.
Here, then, is the basic binary-search algorithm. The identifiers
lower-bound and upper-bound denote the starting
and ending positions of the part of the vector within which the value
sought must lie, if it is present at all. (As in the lab on the merge sort, I adopt the convention
that the starting position is ``inclusive'' -- it is the first position
that is in the subvector -- and the ending position is ``exclusive'' -- it
is the position after the last position in the subvector.)
(define binary-search
(lambda (precedes? vec sought)
(let loop ((lower-bound 0)
(upper-bound (vector-length vec)))
(and (< lower-bound upper-bound) ; Otherwise, the search has failed
; because the subvector is null.
(let* ((midpoint (quotient (+ lower-bound upper-bound) 2))
(middle-element (vector-ref vec midpoint)))
(cond ((precedes? middle-element sought)
(loop (+ midpoint 1) upper-bound))
((precedes? sought middle-element)
(loop lower-bound midpoint))
(else midpoint)))))))
Here is a definition that makes class-roster a name for a
vector of strings containing the surname of everyone in this class. The
vector has been sorted into alphabetical order.
(define class-roster
(vector "Bidler" "Davis" "Gelling" "Griffin" "Haak" "Heck" "Kaiserlian"
"Kleiber" "Krivin" "Luebke" "Lundgren" "Ma" "Mueller" "Park"
"Poush" "Ratnakumar" "Renka" "Ribe" "Rose" "Routh" "Sashikant"
"Shah" "Smith" "Solmose" "Steenhoek" "Tran" "Venugopal" "White"
"Williams" "Wu"))
Call the binary-search procedure, with appropriate arguments,
to determine the position of your surname in this vector.
The textbook introduces the binary search algorithm by describing a guessing game in which one player, A, selects a number in the range from 1 to 100 and the other player, B, tries to guess it by asking yes-or-no questions of the form ``Is your number less than n?'' (putting in specific values for n). The most efficient strategy for B to use is repeated bisection of the range within which A's number is known to lie.
Write a Scheme procedure that takes the part of B in this game. When
invoked, it should print out a question of the specified form and read in
the user's response (presumably, the symbol yes or the symbol
no), then repeat the process until the range of possible
values has been narrowed to contain only one number. The procedure should
then display and identify that number. A sample run might look like this:
> (player-B) Is your number less than 51? yes Is your number less than 26? no Is your number less than 38? no Is your number less than 44? no Is your number less than 47? yes Is your number less than 45? no Is your number less than 46? no Since your number is less than 47 but not less than 46, it must be 46.
This document is available on the World Wide Web as
http://www.math.grin.edu/~stone/courses/scheme/searching-methods.html
created November 30, 1997
last revised December 2, 1997