Reading: Association lists

Read By
Friday, Nov 16, 2018
Summary
We consider association lists, a common technique for storing information, as well as the assoc procedure, which is used for searching through those lists.

Introduction

Consider the organization of a simple email directory for the campus: a sequence of entries, each consisting of a name and an email address. In Scheme, it will be useful to represent both the name and the email address as strings.

One approach would be to write a procedure that uses a very long cond expression to give the email addresses.

(define lookup-email
  (lambda (name)
    (cond
      [(equal? name "Samuel A. Rebelsky")
       "messy-office@grinnell.edu"]
      [(equal? name "Peter-Michael Osera")
       "type-theory-snob@grinnell.edu"]
      [(equal? name "Anya Vostinar")
       "cache@grinnell.edu"]
      [(equal? name "Sarah Dahlby Albright")
       "cheery-coach@grinnell.edu"]
      [else
       (error "invalid name" name)])))

However, that could take a long time to write. As importantly, such a solution is not very adaptable. If we change the notion of equality (e.g., to string-ci=? or to a more sophisticated approximate matching strategy), we have a large number of entries to update.

Another approach is to represent the information more systematically, in a form that lets us see what kind of information we have without any associated code. As long as we’re making a list, we can add other information. Each entry will have a last name, a first name, an email address, a phone number, and any additional characteristics we think are useful.

Last            First   Username        Phone   More
----            -----   --------        -----   ----
Rebelsky        Samuel  messy-office    4410    CSC,151prof,DigHum,Professor
Klinge          Titus   cauldron        3271    CSC,Assistant,151prof
Weinman         Jerod   map-reader      9812    CSC,Chair,Associate,NSF Grant
Osera           PM      type-snob       4010    Assistant,NSF Grant,Tutorial,CSC
Curtsinger      Charlie systems-guy     3127    CSC,Assistant,Tutorial,ASFAC
Vostinar        Anya    cache           4306    CSC,Assistant,Interdisciplinary
Dahlby-Albright Sarah   cheery-coach    4362    Statistics,CSC,Outreach
Rodrigues       Liz     vivero          3362    LIB,DigHum,Assistant
Barks           Sarah   stem-careers    4940    CLS,STEM,Neuroscience
Kington         Raynard the-prez        3000    Bigwig

When computer scientists represent information in this way, they tend to refer to the information as a table. Often, we call a group of tables a database. Once information is grouped into a table, we can write procedures to select information, such as all the CSC folks.

Representing database tables

A great deal of research has gone into ways to represent database tables efficiently, so that the queries we make to select elements or create new tables are quick. These structures are beyond the scope of this course. For now, we will choose a simple and understandable representation that builds upon the core Scheme structures.

In particular, to represent each of these individual entries, we can use a list, such as '("Klinge" "Titus" "cauldron" "3271"). In each case, the last name is the car of the entry, the first name is the cadr of the entry, the email address is the caddr, and so on and so forth. Our entire directory, therefore, would be a list of such entries.

;;; Value:
;;;   grinnell-directory
;;; Type:
;;;   List of lists.
;;; Note:
;;;   * Each sublist is of length at least four and contains a last
;;;     name, a first name, a username, a phone number, and an
;;;     optional sequence of attributes.
;;;   * The
;;; Contents:
;;;   A list of people at Grinnell with contact information and some
;;;   useful attributes.
(define grinnell-directory
  (list
    (list "Rebelsky" "Samuel" "messy-office" "4410")
    (list "Klinge" "Titus" "cauldron" "3271")
    (list "Weinman" "Jerod" "map-reader" "9812")
    (list "Osera" "PM" "type-snob" "4010")
    (list "Curtsinger" "Charlie" "systems-guy" "3127")
    (list "Vostinar" "Anya" "cache" "4306")
    (list "Dahlby-Albright" "Sarah" "cheery-coach" "4362")
    (list "Rodrigues" "Liz" "vivero" "3362")
    (list "Barks" "Sarah" "stem-careers" "4940")
    (list "Kington" "Raynard" "the-prez" "3000")))

Note that we are leaving out the other attributes temporarily for clarity.

While this is a simple representation, it is frequently used by Scheme programmers who want to describe tables. There’s even a name for this arrangement: In Scheme, a list of pairs or lists in which we use the value of the car to identify an entry is called an association list or alist.

As the email directory example suggests, a common application of association lists involves looking for a desired name or first component of an entry and then returning another part of the entry (or something computed from other parts of the entry). For example, we are likely to search the directory for a last name then build an email address by concatenating the username with "@grinnell.edu".

Because the first entry has a special role, we often refer to it as the key of the entry. We refer to the rest of the entry as the associated value. In the directory above, the last names are the keys (e.g., "Rebelsky", "Klinge", and "Kington") and the user names and phone numbers are the associated values.

assoc, Scheme’s built-in lookup procedure

Since applications in which we need to extract data from tables are very common, Scheme provides procedures to retrieve from an association list the entry containing a specified key. The most frequently used procedure of this kind is assoc. Given a key and association list, assoc returns the first entry with the given key. If the key does not occur in the association list, then assoc returns #f. For example,

> (assoc "Barks" grinnell-directory)
'("Barks" "Sarah" "stem-careers" "4940")
> (assoc "Reisch" grinnell-directory)
#f

Extracting information

Once we’ve used assoc to find an entry, what do we do next? Most typically, we then do something with the rest of the entry. To continue our color table example, we might want to convert the username to an email address.

So, what do we do? We use assoc to look up an entry with the given last name. If we find the entry, we then extract the username and turn it into an email address. If we don’t find the entry, we return some default value. In code, we might write something like the following.

;;; Procedure:
;;;   lookup-email-by-last-name
;;; Parameters:
;;;   last-name, a string
;;;   directory, a list of directory entries
;;; Purpose:
;;;   Look up an email address in the table.
;;; Produces:
;;;   email-address, a string
;;; Preconditions:
;;;   * Each entry in directory must be a list.
;;;   * Element 0 of each entry must be a string which represents the
;;;     last name.
;;;   * Element 2 of each entry must be a string which represents the
;;;     user name.
;;;   * Email addresses have the form "username@grinnell.edu".
;;; Postconditions:
;;;   * If an entry for the last name appears once in the table, 
;;;     email-address represents their email address.
;;;   * If multiple entries with a matching last name appear in the table, 
;;;     email-address represents the email address for one of them.
;;;   * If no entries with a matching last name appear in the table,
;;;     email-address is the empty string, ""
;;;   * Does not affect the table.
(define lookup-email-by-last-name
  (lambda (last-name directory)
    (if (assoc last-name directory)
        (string-append (list-ref (assoc last-name directory) 2)
                       "@grinnell.edu")
        "")))

For example,

> (lookup-email-by-last-name "Barks" grinnell-directory)
"stem-careers@grinnell.edu"
> (lookup-email-by-last-name "Reisch" grinnell-directory)
""
> (equal? "" (lookup-email-by-last-name "Barks" grinnell-directory))
#f
> (equal? "" (lookup-email-by-last-name "Reisch" grinnell-directory))
#t

Note that the result depends on the directory. For example,

> (lookup-email-by-last-name "Rebelsky" grinnell-directory)
"messy-office@grinnell.edu"
> (lookup-email-by-last-name "Rebelsky" null)
""
> (lookup-email-by-last-name "Rebelsky"
                             (list (list "Rebelsky" "SA" "lucifer" "0666")))
"lucifer@grinnell.edu"

Some of you may recall asking why if might take a value other than #t or #f as a parameter. This procedure is one example of why.

Looking up an email address, revisited

One problem with the lookup-email-by-last-name procedure given above is that it calls assoc two times with identical parameters, once to determine whether the last name is in the list and once to extract the user name. Rather than calling assoc twice, we might call it once, name that result, and use it each time.

(define lookup-email-by-last-name
  (lambda (last-name directory)
    (let ([assoc-result (assoc last-name directory)])
      (if assoc-result
          (string-append (list-ref assoc-result 2)
                         "@grinnell.edu")
          ""))))

Building more complex entries

If you recall our initial plans for the directory, we wanted to store more than just a name, user name, and phone number. For example, it would be useful to store descriptive information, which we might represent as strings.

(define grinnell-directory-annotated
  (list
   (list "Rebelsky"        "Samuel"  "messy-office" "4410" "CSC" "151prof" "DigHum" "Professor")
   (list "Klinge"          "Titus"   "cauldron"     "3271" "CSC" "Assistant" "151prof")
   (list "Weinman"         "Jerod"   "map-reader"   "9812" "CSC" "Chair" "Associate" "NSF Grant")
   (list "Osera"           "PM"      "type-snob"    "4010" "Assistant" "NSF Grant" "Tutorial" "CSC")
   (list "Curtsinger"      "Charlie" "systems-guy"  "3127" "CSC" "Assistant" "Tutorial" "ASFAC")
   (list "Vostinar"        "Anya"    "cache"        "4306" "CSC" "Assistant" "Interdisciplinary")
   (list "Dahlby-Albright" "Sarah"   "cheery-coach" "4362" "Statistics" "CSC" "Outreach")
   (list "Rodrigues"       "Liz"     "vivero"       "3362" "LIB" "DigHum" "Assistant")
   (list "Barks"           "Sarah"   "stem-careers" "4940" "CLS" "STEM" "Neuroscience")
   (list "Kington"         "Raynard" "the-prez"     "3000" "Bigwig")))

You should note a few things about this revised list. First, we’ve left the first name, user name, and phone number in the same place. That means that lookup-email-by-last-name and any other procedures we’ve written already will continue to work.

> (lookup-email-by-last-name "Barks" grinnell-directory)
"stem-careers@grinnell.edu"
> (lookup-email-by-last-name "Barks" grinnell-directory-annotated)
"stem-careers@grinnell.edu"
> (assoc "Barks" grinnell-directory)
'("Barks" "Sarah" "stem-careers" "4940")
> (assoc "Barks" grinnell-directory-annotated)
'("Barks" "Sarah" "stem-careers" "4940" "CLS" "STEM" "Neuroscience")

Second, we’ve made the association list easier to read by lining up corresponding values, as in a table. We’re taking advantage of the fact that Scheme ignores extra spaces.

Implementing assoc

As is the case with many of the built-in Scheme procedures, assoc is relatively easy to implement ourselves. In essence, assoc is much like other algorithms we’ve written to look for a value in a list. That is, assoc recursively steps through the list until it finds a match or runs out of elements.

;;; Procedure:
;;;   assoc
;;; Parameters:
;;;   key, a Scheme value
;;;   alist, an association list
;;; Purpose:
;;;   Find an entry with key key in alist.
;;; Produces:
;;;   entry, a Scheme value
;;; Preconditions:
;;;   No additional
;;; Postconditions:
;;;   If there is an index, i, such that
;;;     (equal? key (car (list-ref alist i)))
;;;   then entry is the first such entry
;;;   Otherwise, entry is false (#f)
(define assoc
  (lambda (key alist)
    (cond
      ; If there are no entries left in the association list,
      ; there are no entries with the given key.
      [(null? alist) 
       #f]
      ; If the key we're looking for is the key of the first
      ; entry, then use that entry.
      [(equal? key (car (car alist))) 
       (car alist)]
      ; Otherwise, look in the rest of the association list.
      [else 
       (assoc key (cdr alist))])))

“Improving” assoc

As you may have noted, there’s a difficulty with assoc in the context we are using it. In particular, it only does case-sensitive matching.

> (assoc "klinge" grinnell-directory-annotated)
#f
> (assoc "Klinge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")

Since we’re writing our own version of assoc, we can fix that problem. However, it may mean that we need to restrict our keys to strings.

;;; Procedure:
;;;   assoc-ci
;;; Parameters:
;;;   key, a string
;;;   alist, an association list
;;; Purpose:
;;;   Find an entry with a matching key in alist.
;;; Produces:
;;;   entry, a Scheme value
;;; Preconditions:
;;;   The car of each entry in alist is a string
;;; Postconditions:
;;;   * If there is an index, i, such that
;;;     (string-ci=? key (car (list-ref alist i)))
;;;     then entry is the first such entry
;;;   * Otherwise, entry is false (#f)
(define assoc-ci
  (lambda (key alist)
    (cond
      ; If there are no entries left in the association list,
      ; there are no entries with the given key.
      [(null? alist) 
       #f]
      ; If the key we're looking for is the key of the first
      ; entry, then use that entry.
      [(string-ci=? key (car (car alist))) 
       (car alist)]
      ; Otherwise, look in the rest of the association list.
      [else 
       (assoc-ci key (cdr alist))])))

Let’s see if that works.

> (assoc-ci "klinge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
> (assoc-ci "Klinge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
> (assoc-ci "Kringle" grinnell-directory-annotated)
#f
> (assoc-ci "klINge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")

It does. But what happens if we use something other than a string as our search key?

> (assoc 42 grinnell-directory-annotated)
#f
> (assoc-ci 42 grinnell-directory-annotated)
. . string-ci=?: contract violation
  expected: string?
  given: 42
  argument position: 1st
  other arguments...:

If we insist on handling multiple situations (strings, which we compare with string-ci=?, and everything else, which we compare with equal?), we’ll need a slightly more complex approach. In essence, we need to define a new procedure for comparing two elements that takes their type into account.

(define assoc-new
  (let ([match? (lambda (val1 val2)
                  (or (and (string? val1) 
                           (string? val2) 
                           (string-ci=? val1 val2))
                      (equal? val1 val2)))])
    (lambda (key alist)
      (cond
        ; If there are no entries left in the association list,
        ; there are no entries with the given key.
        [(null? alist) 
         #f]
        ; If the key we're looking for matches the key of the first
        ; entry, then use that entry.
        [(match? key (car (car alist))) 
         (car alist)]
        ; Otherwise, look in the rest of the association list.
        [else 
         (assoc-new key (cdr alist))]))))
> (assoc-new "curtSINGER" grinnell-directory-annotated)
'("Curtsinger" "Charlie" "systems-guy" "3127" "CSC" "Assistant" "Tutorial" "ASFAC")
> (assoc-new 42 grinnell-directory-annotated)
#f
> (assoc-new 42 (list (list 42 "the answer")
                      (list "Iverson" "Allen" "The Answer")))
                      
'(42 "the answer")
> (assoc-new "Iverson" (list (list 42 "the answer")
                             (list "Iverson" "Allen" "The Answer")))
'("Iverson" "Allen" "The Answer")
> (assoc-new null (list (list 42 "the answer")
                        (list "Iverson" "Allen" "The Answer")))
#f

Using other keys

The assoc procedure works fine if the key is the first element of a data item. But what if it’s the second (or third, or fourth, or …). For example, what if we want to find the last name of the person whose first name is “PM” or whose email address is "map-reader"? We can’t rely on assoc, because it only looks at the first element of each list. Instead, we need to write our own procedure, using the same structure as above.

For example, to find someone by phone number, we might write a procedure like the following.

;;; Procedure:
;;;   lookup-person-by-phone
;;; Parameters:
;;;   phone, a string
;;;   directory, a list of directory entries
;;; Purpose:
;;;   Find the name of the person who has a particular phone number.
;;; Produces:
;;;   name, a string (or the value #f)
;;; Preconditions:
;;;   * phone is a string with four digits.
;;;   * Each entry in directory must be a list.
;;;   * Element 0 of each entry must be a string which represents the
;;;     last name.
;;;   * Element 1 of each entry must be a string which represents the
;;;     first name.
;;;   * Element 3 of each entry must be a string which represents the
;;;     four-digit phone number.
;;;   * People have both first names and last names.
;;; Postconditions:
;;;   * If an entry for the phone number appears once in the table, 
;;;     name represents their name.
;;;   * If multiple entries with a matching phone number appear in the 
;;;     table, name represents the name for one of them.
;;;   * If no entries with a matching phone number appear in the table,
;;;     name is #f.
;;;   * Does not affect the table.
(define lookup-person-by-phone
  (lambda (phone directory)
    (cond
      [(null? directory)
       #f]
      [(equal? phone (list-ref (car directory) 3))
       (string-append (cadar directory) " " (caar directory))]
      [else
       (lookup-person-by-phone phone (cdr directory))])))
> (lookup-person-by-phone "1234" grinnell-directory)
#f
> (lookup-person-by-phone "3000" grinnell-directory)
"Raynard Kington"

It might even make sense to generalize this procedure, so that we can look up by any of the four key pieces of information.

;;; Procedure:
;;;   lookup-name-by
;;; Parameters:
;;;   key, a string
;;;   position, an integer between 0 and 3, inclusive
;;;   directory, a list of directory entries
;;; Purpose:
;;;   Find the full name of a person who has a matching string at
;;;   the designated position (0 for last name, 1 for first name,
;;;   2 for username, and 3 for four-digit phone number).
;;; Produces:
;;;   name, a string (or the value #f)
;;; Preconditions:
;;;   * phone is a string with four digits.
;;;   * Each entry in directory must be a list.
;;;   * Element 0 of each entry must be a string which represents the
;;;     last name.
;;;   * Element 1 of each entry must be a string which represents the
;;;     first name.
;;;   * Element 2 of each entry must be a string which represents the
;;;     user name.
;;;   * Element 3 of each entry must be a string which represents the
;;;     four-digit phone number.
;;;   * People have both first names and last names.
;;; Postconditions:
;;;   If position is 0 and an entry with the same last name appears 
;;;     somewhere in the table, name is the name associated with
;;;     one such entry.
;;;   If position is 1 and an entry with the same first name appears 
;;;     somewhere in the table, name is the name associated with
;;;     one such entry.
;;;   If position is 2 and an entry with the same user name appears 
;;;     somewhere in the table, name is the name associated with
;;;     one such entry.
;;;   If position is 3 and an entry with the same four-digit phone 
;;;     appears somewhere in the table, name is the name associated 
;;;     with one such entry.
;;;   If no matching entries appear, cname is #f.
;;;   Does not affect the table.
(define lookup-name-by
  (lambda (key position directory)
    (cond
      [(null? directory)
       #f]
      [(string-ci=? key (list-ref (car directory) position))
       (string-append (cadar directory) " " (caar directory))]
      [else
       (lookup-name-by key position (cdr directory))])))
> (lookup-name-by "Samuel" 0 grinnell-directory)
#f
> (lookup-name-by "Samuel" 1 grinnell-directory)
"Samuel Rebelsky"
> (lookup-name-by "3000" 3 grinnell-directory)
"Raynard Kington"
> (lookup-name-by "cheery-coach" 2 grinnell-directory)
"Sarah Dahlby-Albright"

If we want more specific lookup procedures, we can define them in terms of lookup-name-by.

(define lookup-name-by-username
  (lambda (uname directory)
    (lookup-name-by uname 2 directory)))

(define lookup-name-by-first-name
  (section lookup-name-by <> 1 <>))
> (lookup-name-by-username "map-reader" grinnell-directory-annotated)
"Jerod Weinman"
> (lookup-name-by-username "nobody" grinnell-directory-annotated)
#f
> (lookup-name-by-first-name "nobody" grinnell-directory-annotated)
#f
> (lookup-name-by-first-name "Anya" grinnell-directory-annotated)
"Anya Vostinar"

What if we want to return all the values with a particular key, or permit entries where the component is “close enough” to the desired key? We’ll explore those issues in lab.

Self checks

Check 1: Defining assoc

a. What is the definition of an association list?

b. Review the sample implementation of assoc above.

c. All of the example association lists are lists of lists. What do you think that assoc will do if it is given a list in which each element is a pair, rather than a list? For example, can we use assoc to search the following list to determine the first name of a faculty member whose last name you know?

(define math-cs-stats
  (list (cons "Walker" "Henry")
        (cons "Stone" "John")
        (cons "Osera" "Peter-Michael")
        (cons "Vostinar" "Anya")
        (cons "Weinman" "Jerod")
        (cons "Wolf" "Royce")
        (cons "Chamberland" "Marc")
        (cons "Shuman" "Karen")
        (cons "French" "Chris")
        (cons "Mileti" "Joseph")
        (cons "Paulhus" "Jennifer")
        (cons "Klinge" "Titus")
        (cons "Kuiper" "Shonda")
        (cons "Blanchard" "Jeffrey")
        (cons "Jonkman" "Jeffrey")))

d. Confirm or refute your answers by experimentation.

Check 2: Making your own lists

a. Define an association list, sidekicks, that associates the names of some famous cartoon protagonists (as strings) with the names of their sidekicks (again, as strings).

Here’s a table containing information for your association list:

Protagonist Sidekick
Peabody Sherman
Yogi Booboo
Secret Squirrel Morocco Mole
Tennessee Tuxedo Chumley
Quick Draw McGraw Baba Looey
Dick Dastardly Muttley
Bullwinkle Rocky
Bart Simpson Milhouse Van Houten
Asterix Obelix
Strong Bad The Cheat

b. Use the assoc procedure to search the sidekicks association list for someone who is on the list and for someone who is not on the list.

Citations

If we remember correctly, the table of cartoon characters is due to Ben Gum.