Association Lists
Summary: We consider association
lists, a common technique for storing information, as
well as the assoc procedure, which is used for
searching through those lists.
Introduction
Consider a simple problem that has been handled behind the scenes
in MediaScript: Converting from color names, represented as strings, to
the color values we are so accustomed to. While one approach would
be to write a procedure with a very long cond expression,
that could take a long time to write. As importantly, such a solution
is not very adaptable.
Another approach is to represent the information as if it were a
phone book or a dictionary, with a long list of entries. Each entry
would have a color name (e.g., "white"), the red,
green, and blue values (e.g., 255, 255,
and 255), and perhaps even some additional information,
such as the names of common palettes to which the color belongs.
You might come up with something like the following:
Color R G B Additional Information
--------- --- --- --- -----------------------------
red 255 0 0 primary,rainbow,reds,web-safe
green 0 255 0 primary,rainbow,web-safe
yellow 255 255 0 secondary,rainbow
blood-red 102 0 0 reds,web-safe
bronze 140 120 83 metallic
When computer scientists represent information in this way, they tend
to refer to the information as a table. Often,
we call a group of tables a database. Once information
is grouped into a table, we can write procedures to select information,
such as all of the rainbow colors.
Representing Database Tables
A great deal of research has gone into ways to represent database
tables efficiently, so that the queries we make to select elements
or create new tables are quick. These structures are beyond the
scope of this course. For now, we will choose a simple and understandable
representation that builds upon the core Scheme structures.
In particular, to
to represent each of these individual entries, we can use a list,
such as ("red" 255 0 0 "primary" "rainbow" "reds" "web-safe")
or ("bronze"
140 120 83 "metallic"). In each case, the
name of the color is the car of the entry, the red, green, and blue
components are the cadr, caddr, and cadddr of the entry, respectively.
An entire color description, then, would be a list of such entries.
(We're leaving out the other attributes, for now, because they may
complicate things.)
While this is a simple representation, it is frequently used by Scheme
programmers who want to describe tables. There's even a name for this
arrangement: In Scheme, a list of pairs or lists in which we use the
value of the car to identify an entry is called an association
list or alist.
As the color directory example suggests, a common application of
association lists involves looking for a desired name or first component
of an entry and then returning another part of the entry (or something
computed from other parts of the entry). For example, we are likely
to search the color directory for color names and then build a new color
based on the three components. Because the first entry has a special
role, we often refer to it as the key of the
entry. We refer to the rest of the entry as the associated
value. In the color list above, the color names are the
keys (e.g., "black", "blood red", and
"blue") and the RGB components and attributes are the
associated values.
assoc, Scheme's Built-in Lookup Procedure
Since applications in which we need to extract data from tables are
very common, Scheme provides procedures to retrieve from an association
list the entry containing a specified key. The most frequently used
procedure of this kind is assoc. Given a key and
association list, assoc returns the first entry
with the given key. If the key does not occur in the association list,
then assoc returns #f. For example,
Looking Up a Color, Revisited
One problem with the lookup-color-by-name procedure
given above is that it calls assoc four times with
identical parameters, once to determine
whether the color name is in the list and once to determine each
of the three components.
Rather than calling assoc four times, we
might call it once, name that result, and use it each time.
Building More Complex Entries
If you recall our initial plans for the color table, we wanted to store
more than just a color name and its components. We also thought it would
be useful to store attributes of the color, such as whether or not it is
Web safe. Hence, we might update each entry to include that additional
information, which we might represent as strings.
You should note a few things about this revised list. First, we've
left the red, green, and blue components in exactly the same place,
so that lookup-color-by-name still works
correctly. Second, we've made the association list easier to read by
lining up corresponding values, as in a table.
We're taking advantage of the fact that Scheme ignores extra spaces.
Implementing assoc
As is the case with many of the built-in Scheme procedures,
assoc is relatively easy to implement
ourselves. In essence, assoc is much like other
algorithms we've written to look for a value in a list. That is,
assoc recursively steps through the list until
it finds a match or runs out of elements.
Using Other Keys
The assoc procedure works fine if the key is the
first element of a data item. But what if it's the second (or third,
or fourth, or ...). For example, what if we want a color whose red
component is 255 or 102? Then we can't rely on assoc,
because it only looks at the first element of each list. Instead,
we need to write our own procedure, using the same structure
as above.
For example, to find a color with a particular red component,
we might write
It might even make sense to generalize this procedure, so that we can look
up by any of the three components.
We can now define lookup-by-red (and even
lookup-by-green and lookup-by-blue)
in terms of lookup-by-component.
(define lookup-by-red
(lambda (red ctable)
(lookup-by-component red 1 ctable)))
(define lookup-by-green
(lambda (green ctable)
(lookup-by-component green 2 ctable)))
(define lookup-by-blue
(lambda (blue ctable)
(lookup-by-component blue 3 ctable)))
What if we want to return all the colors with a
particular component, or permit colors where the component is
close enough
to the desired component? We'll explore
that issue in lab.