Project: The prisoner's dilemma

This is a continuation of the previous lab, which introduced the prisoner's dilemma game and described Robert Axelrod's approach to testing various strategies in that game.

Features of successful strategies

Axelrod's tournament did not, of course, prove which strategy is ``best'' -- different strategies are optimal in different matches. However, by studying the standings, Axelrod identified some characteristics shared by the most successful and robust strategies in this arena of shrewdly designed players:

Axelrod described a player as nice if she never chooses defection before the other player does. This implies that she chooses cooperation in the first round of every match and continues to choose cooperation unless the other player defects without provocation. In Axelrod's tournament, eight of the fourteen strategies were nice. These eight strategies took the first eight places in the final rankings. Axelrod concluded that, in an environment in which there are other nice players around, it pays to be nice.

A slightly subtler characteristic is provocability. The strategies that placed first and second in Axelrod's tournament shared this characteristic: When the other player chose defection in an early round, both of the successful strategies retaliated (by choosing defection) in the very next round. (Axelrod called players that have this characteristic retaliatory.) Players that chose to ignore a single defection or postponed retaliation for it were less successful.

Axelrod also observed that willingness to forgive the other player for unprovoked defections -- eventually -- helped the most successful strategies. The eighth-place finisher (the least successful of the nice players) always chose cooperation until the other player defected, but subsequently defected every time. Such a player is nice, but totally unforgiving; it ignores the other player's attempts to apologize. Not surprisingly, the other player often detected this unresponsiveness and continued to defect. As a result, neither player emerged from the match with a good score.

Strategies proposed by the class

Here are the strategies offered by members of this class from yesterday's lab:

Of the strategies submitted, only Zero/two revised, Easily irritable, Bob, and Player 2B are nice (in the sense that none of them is ever the first to defect). When nice strategies are paired against one another, the entire match is defection-free, so both players receive three points per round.

Most of the players submitted are provocable. Random player and Sabrina player are unresponsive -- they do not look at what the other player has done and would generate the same sequence of plays against any other player. Wants to win and Loser!!! respond to defections by cooperating, which means that like unresponsive players they are easily exploited by frequently-defecting opponents. Zero/two-revised and Bob are also technically not provocable, since neither one retaliates unless it receives two successive defections.

Easily irritable is the only player that is unforgiving, in the sense that it can reach a state in which it will never cooperate again no matter what the opponent does.

Here are the results of a tournament staged among these twelve players. Each match lasted 185 rounds. The first table gives the match scores (the entry in row m and column n shows how many points player m accumulated during its match with player n). The second table ranks the players by total score for the entire tournament.

                      1   2   3   4   5   6   7   8   9  10  11  12
 1 Fair-weather         490 457 261 559 529 173 463 559 372 559 245
 2 Wants to win     360     445 434 153 373   0 229  24 477 338 333
 3 Random player    302 395     409 226 425 507 294 114 492 267 387
 4 2B4              256 429 409     557 372 369 347 195 420 557 557
 5 Zero/two revised 549 718 521 552     279 696 355 555 397 555 555
 6 Sabrina player   154 368 405 372 739     365 368 109 484 109 466
 7 Loser!!!         233 925 327 369 171 375     251  26 469 335 335
 8 Whoever          328 624 489 352 310 438 601     161 401 198 234
 9 Easily irritable 549 899 534 200 555 549 896 341     293 555 555
10 Cautious         377 367 397 415 307 434 364 351 173     177 410
11 Bob              549 513 487 552 555 549 515 318 555 277     555
12 Player 2B        240 518 462 552 555 461 515 354 555 455 555    

Easily irritable      5926
Zero/two revised      5732
Bob                   5425
Player 2B             5222
Fair-weather          4667
2B4                   4468
Whoever               4136
Sabrina player        3939
Random player         3818
Loser!!!              3816
Cautious              3772
Wants to win          3166

A total of 12210 rounds of the prisoner's dilemma were played. The average combined score per round was 4.43; consistent cooperation would have produced an average combined score of 6. If every player had cooperated consistently, each of them would have scored 6105 points -- more than any of them actually received.

I observe that all four of the nice players ranked ahead of all eight of the others.

It is somewhat surprising that the unforgiving player Easily irritable placed first. Its success is due primarily to the high scores it ran up against Wants to win and Loser!!! by discovering that those opponents would respond to defection by cooperating; Easily irritable was able to switch into always-defect mode in order to exploit this unusual characteristic.

Axelrod's tournaments

The player that finished in first place in Axelrod's tournament was Look back, which was submitted by Anatol Rapoport of the University of Toronto:

(define look-back
  (lambda (rounds-played my-plays your-plays)
    (if (zero? rounds-played)
        'cooperation
        (car your-plays))))

Look back was the shortest, simplest player submitted! (The original implementation, in FORTRAN, was only four lines long.) Look back is nice, since it cannot choose to defect until the other player has done so. It is retaliatory, since it immediately replies to a defection by defecting right back. And it is forgiving; it retaliates for a defection immediately, but does not hold a grudge into subsequent rounds.

Axelrod published the results of this tournament, together with his thought-provoking analysis. He then announced a second tournament! Once again, he solicited entries from a large variety of thoughtful and competent people. Some of the participants in the first tournament also submitted entries in the second one, with improvements reflecting their study and consideration of the results of the first tournament. Rapoport submitted Look back again. Altogether, the second tournament attracted sixty-two entries.

As in the first tournament, these entries were paired off in every possible way. This time, the matches were halted after 153 rounds. (The number of rounds per match was not announced in advance, since one of the conventions of the game is that players do not know when the match will end.) The match scores for each player were summed, and the players were ranked by total score.

Astonishingly, Look back won again! Even though all of the participants in the second tournament knew that this simple strategy had won the first tournament, and several of them submitted players that would have scored even higher than Look back if they had been entered in that first tournament, none of those ingenious ``improvements'' was quite as successful in adapting to the rather different collection of players entered in the second tournament.

In his subsequent analysis, Axelrod explained the surprising robustness of Look back -- its success with a wide variety of players -- by noting three conditions that combined to make it non-exploitable: (1) It is a common enough strategy that other players expect to encounter it or some variation of it. (2) When other players encounter Look back, they can identify it quickly (or at least find out quickly that it is nice, retaliatory, and forgiving). (3) When other players have perceived these features of Look back, it is easy for them to infer that there is no way to exploit it.

Look back has another surprising characteristic: It won both of these tournaments without ever scoring more points than the other player in a match! Look back scores either exactly the same number of points as the other player, or else slightly fewer (if the other player has managed to get in one more defection than Look back). Its success in the overall tournament derives from the fact that it elicits cooperative behavior from its opponent, enabling them both to do well.

(If Look back had been submitted as an additional entry in our tournament, however, it would have placed fifth -- behind all of the other nice players, but still ahead of all the non-nice ones.)

You can find a fuller account of both tournaments, with many fascinating details, in Axelrod's excellent book The evolution of cooperation (New York: Basic Books, Inc., 1984). There's also a World Wide Web site, http://pscs.physics.lsa.umich.edu/Software/CC/ECHome.html, that includes the FORTRAN implementation of Axelrod's second tournament.

Today's exercise

The exercise for today, then, is to design and submit an improved strategy, taking account of the lessons learned from this first tournament. Send them to me by e-mail; I'll stage a second tournament and report on the results.


This document is available on the World Wide Web as

http://www.math.grin.edu/courses/Scheme/fall-1997/dilemma-project-2.html

created July 17, 1997
last revised October 3, 1997

John David Stone (stone@math.grin.edu)