Vectors
Summary:
In our primary explorations of color palettes, we found that palettes
permitted us to write smaller images, but at a significant cost: Both
storing and restoring images require us to step through all the colors
in the palette. Is there a better way? If we focus on restoring
images from files, we can acheive significant gains by using
vectors, rather than lists, to represent the
palettes.
Vectors are data structures that are very similar to lists in that
they arrange data in linear fashion. Vectors differ from lists in two
significant ways: Unlike lists, vectors are indexed
and vectors are mutable.
Introduction: Deficiencies of Lists
As you've seen in many of the procedures and programs we've written so
far, there are many problems in which we have to deal with collections
of information. We have now learned two techniques for representing
collections of data:
We can represent the collection as a list.
We can represent the collection as a file.
Both of these ways to represent collections have some similar
deficiencies. In particular, it is relatively expensive to
get a particular element of a list (or file) and
it is equally expensive to change a particular
element. Why is it expensive to get an element (say, the tenth
element)? In the case of a list, we need to cdr through the list until
we reach the element. In the case of a file, we need to read through
the preceding elements.
Changing an element may be even worse, because once we've reached the
position, we need to build the list (or file) back to a new form.
For example, consider the problem of restoring an image from a file.
We repeatedly read the index of the next color and then grab it
from the palette using list-ref.
Note that there are different costs associated with getting a color
near the start of the palette: We need about four recursive calls to
get the 4th color, and 123 recursive calls to get the 123rd color.
Larger palettes will significantly slow down the code for restoring
an image.
But that's not the only problem. Suppose midway through designing
a palette, we decide to replace one color in the palette. With lists,
we need to build a new palette, even though conceptually we are
replacing the color.
Do these problems mean that lists, files, and other similar structures are
inappropriate ways to represent collections? Certainly not. Rather,
they work very well for some purposes (e.g., it is easy to extend a
list; files persist between invocations of Scheme) and less well for
other purposes (e.g., extracting and changing).
To resolve these deficiencies, Scheme provides an alternate mechanism for
representing collections, the vector.
A Key Feature of Vectors: Indexing
You may have noted that when we use lists to group data
(e.g., the information for a spot or a shape, the colors in
a palette), we need to use list-ref or
repeated calls to cdr to get later elements
of the list. Unfortunately, as the sample code above suggests,
list-ref works by cdr'ing down the list. Hence,
it takes about five steps to get to the fifth element of the list and
about one hundred steps to get to the one hundredth element of a list.
Similarly, to get to the fifth element of a file, we'll need to read
the preceding elements and to get to the hundredth element, we'll also
need to read through the preceding elements. It would be nicer if
we could access any element of the group of data in the same amount
of time (preferably a small amount of time).
Vectors address this problem.
Vectors contain a fixed number of elements and provide indexed
access (also called random access)
to those elements, in the sense that each element, regardless of its
position in the vector, can be recovered in the same amount of time.
In this respect, a vector differs from a list or a file: The initial
element of a list is immediately accessible, but subsequent elements
are increasingly difficult and time-consuming to access.
Another Key Feature of Vectors: Mutation
You may have also noted that we occasionally want to change an element
of a group of data (e.g., to change a student's grade in the structure
we use to represent that student; to change a color in a color palette).
When we use lists, we essentially need to build a new list to change
one element. When we use files, we often have to build a new file,
copying both preceding and subsequent values.
Vectors are mutable data structures: It is possible
to replace an element of a vector with a different value, just as one can
take out the contents of a container and put in something else instead.
It's still the same vector after the replacement, just as the container
retains its identity no matter how often its contents are changed.
For example, to set the color 5 of a vector of colors to a darker version
of the color, one would write
> (vector-set! palette 5 (rgb-darker (vector-ref palette 5)))
The particular values that a vector contains at some particular moment
constitute its state. One could summarize the
preceding paragraph by saying that the state of a vector can change and
that state changes do not affect the underlying identity of the vector.
Practical Details: Showing Vectors
When showing a vector, Scheme shows each of its elements, enclosed
in parentheses, with an extra character, #, in front of
the left parenthesis. For instance, here's how Scheme shows a vector
containing the strings "alpha", "beta",
and "gamma", in that order:
#("alpha" "beta" "gamma")
The mesh (also called pound, sharp, or hash) character distinguishes
the vector from the list containing the same elements. Sometimes you'll
see a number after the mesh character. That number
gives the length of the vector.
Some implementations of Scheme permit us to use vector
literals, in which a programmer can use a similar syntax
to specify a vector when writing a Scheme program or typing commands
and definitions into the Scheme interactive interface. In some such
implementations, the literal begins with the hash mark. In others,
the programmer must place a single quotation mark before the mesh so
that Scheme will not try to evaluate the vector as if it were some
exotic kind of procedure call.
We traditionally recommend that you avoid using this notation
just as we recommend that you avoid the corresponding list literal
notation for lists.
Vector Procedures
Standard Scheme provides the following fundamental procedures for creating
vectors and selecting and replacing their elements:
vector
The constructor vector takes any number of arguments and
assembles them into a vector, which it returns.
> (vector "alpha" "beta" "gamma")
#("alpha" "beta" "gamma")
> (vector) ; the empty vector -- no elements!
#()
> (define beta 2)
> (vector "alpha" beta (list "gamma" 3) (vector "delta" 4) (vector "epsilon"))
#("alpha" 2 ("gamma" 3) #("delta" 4) #("epsilon"))
As the last example shows, Scheme vectors, like Scheme lists, can be
heterogeneous, containing elements of various
types.
make-vector
The make-vector procedure takes two
arguments, a natural number k and
a Scheme value obj, and returns a
k-element vector in which each position is
occupied by obj.
> (make-vector 12 "foo")
#("foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo")
> (make-vector 4 0)
#(0 0 0 0)
> (make-vector 0 4) ; the empty vector, again
#()
The second argument is optional; if you omit it, the value that
initially occupies each of the positions in the array is left
unspecified. Various implementations of Scheme have different
ways of filling them up, so you should omit the second argument of
make-vector only when you intend to replace the
contents of the vector right away.
vector?
The type predicate vector? takes any Scheme value as argument
and determines whether it is a vector.
> (vector? (vector "alpha" "beta" "gamma"))
#t
> (vector? (list "alpha" "beta" "gamma")) ; a list, not a vector
#f
> (vector? "alpha beta gamma") ; a string, not a vector
#f
vector-length
The vector-length procedure takes one argument,
which must be a vector, and returns the number of elements in the
vector.
> (vector-length (vector 3 1 4 1 5 9))
6
> (vector-length (vector "alpha" "beta" "gamma"))
3
> (vector-length (vector))
0
vector-ref
The selector vector-ref takes two arguments --
a vector vec and a natural number k (which
must be less than the length of vec). It returns the
element of vec that is preceded by exactly k
other elements. In other words, if k is 0, you get the
element that begins the vector; if k is 1, you get
the element after that; and so on.
> (vector-ref (vector 3 1 4 1 5 9) 4)
5
> (vector-ref (vector "alpha" "beta" "gamma") 0)
alpha
> (vector-ref (vector "alpha" "beta" "gamma") 3)
vector-ref: out of bounds: 3
vector-set!
All of the previous procedures look a lot like list procedures, except
that many are more efficient (e.g., vector? and
vector-length take a constant number of steps;
list? takes a number of steps proportional to the
the length of the list and list-ref takes a number
of steps proportional to the index). Now let's see a procedure that's
much different. We can use procedures to change vectors.
The mutator vector-set! takes three
arguments -- a vector vec, a natural
number k (which must be less than
the length of vec), and a Scheme
value obj -- and replaces the element
of vec that is currently in the position
indicated by k with obj.
This changes the state of the vector irreversibly; there is no way to
find out what used to be in that position after it has been replaced.
As you may recall, it is a Scheme convention to place an exclamation
point meaning Proceed with caution!
at the end of the
name of any procedure that makes such an irreversible change in the
state of an object.
The value returned by vector-set! is unspecified;
one calls vector-set! only for its side effect
on the state of its first argument.
> (define sample-vector (vector "alpha" "beta" "gamma" "delta" "epsilon"))
> sample-vector
#("alpha" "beta" "gamma" "delta" "epsilon")
> (vector-set! sample-vector 2 "zeta")
> sample-vector ; same vector, now with changed contents
#("alpha" "beta" "zeta" "delta" "epsilon")
> (vector-set! sample-vector 0 "foo")
> sample-vector ; changed contents again
#("foo" "beta" zeta "delta" "epsilon")
> (vector-set! sample-vector 2 -38.72)
> sample-vector ; and again
#("foo" "beta" -38.72 "delta" "epsilon")
Vectors introduced into a Scheme program by means
of the mesh-and-parentheses notation are supposed to be
immutable
: applying vector-set!
to such a vector is an error, and the contents of such vectors are
therefore constant.
> (define vec #(1 2 3))
> vec
#(1 2 3)
> (vector-ref vec 0)
1
> (vector-set! vec 0 0)
vector-set!: expects type <mutable vector> as 1st argument, given: #(1 2 3); other arguments were: 0 0
Warning! Some implementations
of Scheme don't enforce this rule.
vector->list and list->vector
The vector->list procedure takes any vector as
argument and returns a list containing the same elements in the same
order; the list->vector procedure performs
the converse operation.
> (vector->list (vector 31 27 16))
(31 27 16)
> (vector->list (vector))
()
> (list->vector (list #\a #\b #\c))
#(#\a #\b #\c)
> (list->vector (list 31 27 16))
#(31 27 16)
vector-fill!
The vector-fill! procedure takes two arguments,
the first of which must be a vector. It changes the state of that
vector, replacing each of the elements it formerly contained with the
second argument.
> (define sample-vector (vector "rho" "sigma" "tau" "upsilon"))
> sample-vector ; original vector
#("rho" "sigma" "tau" "upsilon")
> (vector-fill! sample-vector "kappa")
> sample-vector ; same vector, now with changed contents
#("kappa" "kappa" "kappa" "kappa")
The vector-fill! procedure is invoked only for
its side effect and returns an unspecified value.
Implementing Standard Vector Procedures
Some implementations of Scheme may lack
the list->vector,
vector->list, and
vector-fill! procedures, but it is straightforward
to define them in terms of the others.
list->vector
Since we are writing this procedure ourselves, we should begin by
documenting it.
In most implementations, we will need to recurse over the list, adding
elements to a corresponding vector. We will also need to build that
vector first. Since we only want to build one vector, we should use
the husk-and-kernel technique. The husk creates the vector and tells
the kernel and then calls the kernel.
However, we may need other parameters for the kernel, so it is time to
think a little bit about the kernel. Since we want to copy all the
elements of the list to the vector, we probably need to repeat some
basic step, and the only way we know how to do repetition is recursion.
To keep track of what we are copying, we will probably need a counter that
we pass to the helper. Let us call that counter pos, since
it keeps track of a position in the list or vector.
What should we copy from at each step? We could copy the element at
position pos of the list into position pos
of the vector. However, that is somewhat inefficient because it requires
us to step through to the posth element of the list.
Since we no longer need a value from the list once we have copied it,
we can cdr through the list as we step through the vector. Now,
our goal is to copy the initial element of the list into the appropriate
position of the vector.
When do we stop? When we run out of elements to copy.
Given all of the above, here is how to set things up.
The kernel is a little bit more complicated. We need to keep track
of where we are in the vector. We may also want to carefully specify
what this kernel is supposed to do. (Such specification is not always
necessary for helpers, but I think it clarifies things in this case.)
Is this the only way to write the list->vector
procedure? No. More advanced students might know about the
apply procedure which makes life significantly
easier.
(define list->vector
(lambda (lst)
(apply vector lst)))
In other words: Call the vector procedure, giving it the
elements of lst as its arguments. Don't worry if you don't
understand this now. We will return to the idea later in the semester.
vector->list
Now let us consider how to go in the opposite direction. Once again,
we will probably need to recurse to step through positions and need to
keep track of the position with an extra variable. Should we create a
list first and then populate it? No. We do not typically mutate lists.
So, we will update the list on the fly
, adding elements
and extending the list at each step. Let's start with the helper.
The helper will build a list from a subrange of the vector.
Now, we just have to set things up for this kernel. We start at
position 0.
We end just before the length of the vector. So, here goes ...
Patterns of Recursion Over Vectors
Each time we learn a new structure, we learn techniques for recursing
over that structure. As the previous examples suggested, recursion
over vectors is relatively straightforward, but usually requires that
we have a helper procedure that includes additional parameters - the
current position in the vector and the length of the vector. (Why do
we include the length? So that we don't have to recompute it each
time.)
The test for the base case is then to check whether the current position
equals the length and the simplify
step is to add 1 to
the position.
(define vector-proc
(lambda (vec other)
(vector-proc-kernel vec
other 0 (vector-length vec))))
(define vector-proc-kernel
(lambda (vec other pos len)
(if (= pos len)
(base-case vec other)
(combine (vector-ref vec pos)
(vector-proc-kernel vec other (+ post 1) len)))))
At times, it's better to start at the end of the vector and work backwards.
In this strategy, we get the base case when the position reaches 0 and we
simplify by subtracting 1.
(define vector-proc
(lambda (vec other)
(vector-proc-kernel vec other (- (vector-length vec) 1))))
(define vector-proc-kernel
(lambda (vec other pos)
(if (< pos 0)
(base-case vec other)
(combine (vector-ref vec pos)
(vector-proc-kernel vec other (- pos 1))))))
Vectors as Palettes
Let's return to our motivating problem, using a vector to represent
a palette. What procedures will we need associated with this palette?
Our recent experience suggests that we'll want six basic procedures:
(palette-new size),
which creates a palette that holds up to size
values.
(palette-add-color!
palette color),
which adds a color to the palette.
(palette-encode-color
palette
color), which encodes
color to the appropriate index in
palette.
(palette-decode-color
palette code), which
turns a code from palette-encode-color back to
color.
(palette-size
palette), which determines the number
of colors currently stored in the palette.
(palette-capacity
palette), which determines the total
capacity of the palette (the number of colors we can
store, rather than the number of colors we've already stored).
Since the palette created by palette-new is supposed
to start with no colors, but vectors are a fixed size, we need to fill the
vector with some value that indicates nope, there's no color in
this position
. We will use the value #f. Why
#f? It's a simple and easy-to-remember value. It also makes
it easier to check if position i is empty, with
(not (vector-ref palette i)).
We determine the capacity of the palette by finding the size of the
underlying vector.
We'll add colors to the palette by stepping through the indices until
we find the first #f. (Note that this is less efficient
than adding a color to a list-based palette, where we can just cons the
color on to the front.) As we noted above, we know that we have a false
when (not (vector-ref palette i)).
Similarly, we find the number of colors in the palette by stepping through
the indices until we find the first #f or run out of
colors.
We're now ready for the procedures we use most of the time: encoding and
decoding colors. To encode a color, we step through the positions to
find the closest color. This strategy is nearly identical to the
strategy we use for lists, and probably takes about as much computing
power.
In contrast, decoding a color requires little more than looking it up
in the vector with vector-ref.