When the result of everyone's single-minded pursuit of his own interests is not satisfactory to anyone, the members of a community can often achieve a better outcome by cooperating with one another. Cooperation requires the members of the community to trust one another, at least to some extent; but they can often develop this mutual trust both by experiencing the benefits of cooperation and by recognizing the threat of retaliation for breaches of trust. Often the benefits of cooperation are so great that the threatened retaliation need not be extreme, enduring, or inevitable in order to be effective.
A mathematical game, traditionally called ``the prisoner's dilemma,'' can be used illustrate these claims. It is a game for two players; I'll call them Alice and Bob. The object of the game is to score as many points as possible. Each player has to make a decision between two alternatives, cooperation and defection. The players make their decisions independently, without consultation, and simultaneously; they then announce those decisions simultaneously. The outcome is then scored as follows:
It is clear to Alice that, no matter what Bob does, she will receive more points for choosing defection than for choosing cooperation. (If Bob chooses cooperation, Alice will get 5 points for defection and only 3 for cooperation; if Bob chooses defection, Alice will get 1 point for defection, none at all for cooperation.) So it is rational for Alice to choose defection.
However, for exactly the same reason, it is rational for Bob to choose defection; the game is symmetrical. Thus, if both players make rational choices, each will receive 1 point. Both Alice and Bob would prefer to score the 3 points that they would receive if they both chose cooperation, but because they cannot collaborate -- their choices must be made independently and simultaneously, remember -- there is no way for either player to make this happen. Alice can't trust Bob; Bob can't trust Alice.
At least, this appears to be the case if Alice and Bob are going to play only one round of this game. Suppose, however, that they go on to play a whole sequence of rounds -- an indefinitely long sequence, so that neither Alice nor Bob knows when the sequence will end. The object of the game is to score as many points as possible over the course of the entire match -- not to score more points than the other player, but simply to accumulate as large a total as one can. (Imagine that a point is a dollar. Bob would rather ``lose'' by a score of 328 to 350 than ``win'' by a score of 156 to 139, because he'd get $172 more.) We still assume that Alice and Bob make their decisions in each round independently and simultaneously; but we allow them to know the outcome of each round before proceeding to the next.
In such a match, it is sometimes rational to choose cooperation, if by doing so one can induce the other player to choose cooperation in subsequent rounds. Choosing cooperation in an early round is a tentative offer of trust, a way for Alice to send a signal to Bob, expressing an interest in cooperating for their mutual benefit. Defection can also be used as a signal -- for example, as a way of retaliating for an unprovoked defection by the other player in a preceding round.
Of course, such signals are pointless unless the other player picks them up and acts on them. One can imagine an unresponsive player who figures out in advance what he is going to do in each round of the match and sticks to that plan without paying any attention to what the other player does. (For instance, a player who always chooses defection, no matter what, is unresponsive; so is one who chooses cooperation in every odd-numbered round and defection in every even-numbered one.) If the other player is unresponsive, then the match is just the basic one-round game played over and over again, and the rational course of action is to defect every time.
If the other player is responsive, however, some more interesting strategies are possible, and no one of them is ``best'' under all circumstances. (Alice would get the best possible result if she could somehow anticipate Bob's decisions and always make the same choice he does. But there is no way for her to do this reliably, since their decisions are announced simultaneously.) One could, for instance, choose cooperation in every round so long as the other player continued to cooperate; if the other player ever chose defection, one could retaliate by choosing defection in the next round, or perhaps in the next two rounds, or perhaps even in all subsequent rounds. Or one could choose cooperation for several rounds and then try to sneak in a single defection without attracting retribution. One could choose cooperation for twenty rounds and then toss a coin to decide between cooperation and defection on subsequent rounds. One could try to deduce, from the other player's choices in early rounds, whether the other player is responsive, with the intention of choosing cooperation in later rounds if she is and choosing defection if she is not. And there are still other possibilities
Robert Axelrod, a political scientist at the University of Michigan School of Public Policy, devised a way to test such strategies against one another: He organized a ``prisoner's dilemma'' tournament. He invited a number of distinguished psychologists, economists, political scientists, mathematicians, and sociologists to suggest strategies, in the form of FORTRAN or BASIC subroutines. He fitted these into a program that paired off the suggested strategies in every possible way and ran each pair through a two-hundred-round match, collecting the scores. He then added up the scores that each strategy had run up in all of its pairwise matches and ranked the strategies by total score.
Axelrod's tournament did not, of course, prove which strategy is ``best'' -- different strategies are optimal in different matches. However, by studying the standings, Axelrod identified some characteristics shared by the most successful and robust strategies in this arena of shrewdly designed players. I'll describe some of his findings in tomorrow's handout.
Today's project, however, is to design and implement, in Scheme, a player
that could participate in a tournament like Axelrod's: a procedure that
returns either the symbol cooperation or the symbol
defection each time it is invoked. The arguments to such a
procedure give it all the available information about previous rounds of
the match: how many such rounds there have been and what plays were made
in those previous rounds by each player.
I have presented this game many times, to various groups of high-school and college students, and in most cases I've asked them to suggest strategies and to create players. Today's project file contains several such proposals:
The Just mean player defects in every round, regardless of what the other player does.
The Look back player cooperates in the first round. On every subsequent round, it does whatever the other player did in the previous round. (In other words, it repays an opponent's cooperation in the previous round by cooperating in the current round and punishes an opponent's defection in the previous round by defecting in the current round.)
The Point seven player cooperates in the first round and thereafter cooperates whenever the other player cooperated in the preceding round; after a defection by the other player, Point seven generates a random number in the range from 0 to 1 and defects if the random number is less than 0.7. (In other words, Point seven is like Look back, except that it retaliates only seventy percent of the time; it ignores the other thirty percent of the other player's defections.)
The Cheap player defects in the first round, but after that it plays the same strategy as Look back.
The Absentee player cooperates in the first two rounds and subsequently defects if the other player has defected in each of the two preceding rounds, but cooperates if the other player cooperated in either of those two rounds.
The Overtime player follows a pre-programmed sequence of acts of cooperation and defection, basing its choice entirely on the round number and not on the opponent's behavior. The pattern consists of alternating strings of cooperations and defections, growing longer and longer as the round number increases. After round 100, it always defects.
Golden rule cooperates in the first round, and thereafter always responds to a defection with a defection. In responding to cooperation after the first round, it looks to see how frequently its opponent has defected in all previous rounds and defects with the same probability.
Tester defects in the first round, cooperates in the second round, and continues to alternate defection with cooperation as long as the other player does not defect. Tester cooperates after the other player's first defection, but after that plays the same strategy as Look back.
The Zero/two player cooperates indefinitely as long as its opponent does not make the fatal mistake of defecting twice in a row; Zero/two responds to two defections in a row by defecting in every subsequent round.
Using these procedures as models, devise a strategy for a player in a prisoner's-dilemma tournament and implement it as a Scheme procedure. Send me the result by e-mail; I'll collect the results and report on them during tomorrow's lab.
This document is available on the World Wide Web as
http://www.math.grin.edu/courses/Scheme/fall-1997/dilemma-project.html
created July 17, 1997
last revised October 3, 1997