This is a continuation of the previous lab, which introduced the prisoner's dilemma game and described Robert Axelrod's approach to testing various strategies in that game.
Axelrod's tournament did not, of course, prove which strategy is ``best'' -- different strategies are optimal in different matches. However, by studying the standings, Axelrod identified some characteristics shared by the most successful and robust strategies in this arena of shrewdly designed players:
Axelrod described a player as nice if she never chooses defection before the other player does. This implies that she chooses cooperation in the first round of every match and continues to choose cooperation unless the other player defects without provocation. In Axelrod's tournament, eight of the fourteen strategies were nice. These eight strategies took the first eight places in the final rankings. Axelrod concluded that, in an environment in which there are other nice players around, it pays to be nice.
A slightly subtler characteristic is provocability. The strategies that placed first and second in Axelrod's tournament shared this characteristic: When the other player chose defection in an early round, both of the successful strategies retaliated (by choosing defection) in the very next round. (Axelrod called players that have this characteristic retaliatory.) Players that chose to ignore a single defection or postponed retaliation for it were less successful.
Axelrod also observed that willingness to forgive the other player for unprovoked defections -- eventually -- helped the most successful strategies. The eighth-place finisher (the least successful of the nice players) always chose cooperation until the other player defected, but subsequently defected every time. Such a player is nice, but totally unforgiving; it ignores the other player's attempts to apologize. Not surprisingly, the other player often detected this unresponsiveness and continued to defect. As a result, neither player emerged from the match with a good score.
Here are the strategies offered by members of this class from yesterday's lab:
Fair-weather friend defects ninety percent of the time, unless the other player has cooperated in each of the two preceding rounds, in which case it always cooperates.
Wants to win cooperates on the first round, but thereafter does the opposite of whatever the other player did in the previous round.
Random player tosses a coin on each round to choose randomly between cooperation and defection.
2B4 defects on the first round, cooperates in the second round, and thereafter does whatever the other player did two rounds earlier.
Zero/two revised cooperates indefinitely as long as its opponent does not defect twice in a row. After such a double defection, however, Zero/two revised defects eighty percent of the time and cooperates only twenty percent of the time.
The Sabrina player cooperates on the first two rounds and alternates between cooperation and defection thereafter.
Loser!!! defects on the first round and thereafter does the opposite of whatever the other player did in the previous round.
The Whoever player cooperates on the first two rounds. After that point, it determines what the other player has done in the preceding two rounds; if the other player has cooperated twice or defected twice, Whoever responds by defecting eighty percent of the time and cooperating the other twenty percent. If the other player's previous two moves comprise one cooperation and one defection, however, Whoever chooses randomly and with equal probability between cooperation and defection.
Easily irritable cooperates for the first five rounds. If, at any point, an opponent defects twice in a row or more than forty percent of the time, Easily irritable never cooperates again.
Cautious and mistrustful defects on every round until the other player cooperates. Once the other player has cooperated, Cautious and mistrustful computes half of the number of times the other player has cooperated; it then adds that number to either 60 (if the other player cooperated in the preceding round) or 10 (if the other player has just defected) and treats the result as a percentage. It cooperates that percentage of the time and defects otherwise.
Bob cooperates until the other player has defected against him twice. After that, whenever the other player defects, Bob defects back in each of the two subsequent rounds.
Player 2B cooperates in the first two rounds and thereafter does whatever the other player did in the preceding round, except that if the other player has defected twice in a row and then cooperated, Player 2B punishes the other player by responding with an additional defection.
Of the strategies submitted, only Zero/two revised, Easily irritable, Bob, and Player 2B are nice (in the sense that none of them is ever the first to defect). When nice strategies are paired against one another, the entire match is defection-free, so both players receive three points per round.
Most of the players submitted are provocable. Random player and Sabrina player are unresponsive -- they do not look at what the other player has done and would generate the same sequence of plays against any other player. Wants to win and Loser!!! respond to defections by cooperating, which means that like unresponsive players they are easily exploited by frequently-defecting opponents. Zero/two-revised and Bob are also technically not provocable, since neither one retaliates unless it receives two successive defections.
Easily irritable is the only player that is unforgiving, in the sense that it can reach a state in which it will never cooperate again no matter what the opponent does.
Here are the results of a tournament staged among these twelve players. Each match lasted 185 rounds. The first table gives the match scores (the entry in row m and column n shows how many points player m accumulated during its match with player n). The second table ranks the players by total score for the entire tournament.
1 2 3 4 5 6 7 8 9 10 11 12
1 Fair-weather 490 457 261 559 529 173 463 559 372 559 245
2 Wants to win 360 445 434 153 373 0 229 24 477 338 333
3 Random player 302 395 409 226 425 507 294 114 492 267 387
4 2B4 256 429 409 557 372 369 347 195 420 557 557
5 Zero/two revised 549 718 521 552 279 696 355 555 397 555 555
6 Sabrina player 154 368 405 372 739 365 368 109 484 109 466
7 Loser!!! 233 925 327 369 171 375 251 26 469 335 335
8 Whoever 328 624 489 352 310 438 601 161 401 198 234
9 Easily irritable 549 899 534 200 555 549 896 341 293 555 555
10 Cautious 377 367 397 415 307 434 364 351 173 177 410
11 Bob 549 513 487 552 555 549 515 318 555 277 555
12 Player 2B 240 518 462 552 555 461 515 354 555 455 555
Easily irritable 5926
Zero/two revised 5732
Bob 5425
Player 2B 5222
Fair-weather 4667
2B4 4468
Whoever 4136
Sabrina player 3939
Random player 3818
Loser!!! 3816
Cautious 3772
Wants to win 3166
A total of 12210 rounds of the prisoner's dilemma were played. The average combined score per round was 4.43; consistent cooperation would have produced an average combined score of 6. If every player had cooperated consistently, each of them would have scored 6105 points -- more than any of them actually received.
I observe that all four of the nice players ranked ahead of all eight of the others.
It is somewhat surprising that the unforgiving player Easily irritable placed first. Its success is due primarily to the high scores it ran up against Wants to win and Loser!!! by discovering that those opponents would respond to defection by cooperating; Easily irritable was able to switch into always-defect mode in order to exploit this unusual characteristic.
The player that finished in first place in Axelrod's tournament was Look back, which was submitted by Anatol Rapoport of the University of Toronto:
(define look-back
(lambda (rounds-played my-plays your-plays)
(if (zero? rounds-played)
'cooperation
(car your-plays))))
Look back was the shortest, simplest player submitted! (The original implementation, in FORTRAN, was only four lines long.) Look back is nice, since it cannot choose to defect until the other player has done so. It is retaliatory, since it immediately replies to a defection by defecting right back. And it is forgiving; it retaliates for a defection immediately, but does not hold a grudge into subsequent rounds.
Axelrod published the results of this tournament, together with his thought-provoking analysis. He then announced a second tournament! Once again, he solicited entries from a large variety of thoughtful and competent people. Some of the participants in the first tournament also submitted entries in the second one, with improvements reflecting their study and consideration of the results of the first tournament. Rapoport submitted Look back again. Altogether, the second tournament attracted sixty-two entries.
As in the first tournament, these entries were paired off in every possible way. This time, the matches were halted after 153 rounds. (The number of rounds per match was not announced in advance, since one of the conventions of the game is that players do not know when the match will end.) The match scores for each player were summed, and the players were ranked by total score.
Astonishingly, Look back won again! Even though all of the participants in the second tournament knew that this simple strategy had won the first tournament, and several of them submitted players that would have scored even higher than Look back if they had been entered in that first tournament, none of those ingenious ``improvements'' was quite as successful in adapting to the rather different collection of players entered in the second tournament.
In his subsequent analysis, Axelrod explained the surprising robustness of Look back -- its success with a wide variety of players -- by noting three conditions that combined to make it non-exploitable: (1) It is a common enough strategy that other players expect to encounter it or some variation of it. (2) When other players encounter Look back, they can identify it quickly (or at least find out quickly that it is nice, retaliatory, and forgiving). (3) When other players have perceived these features of Look back, it is easy for them to infer that there is no way to exploit it.
Look back has another surprising characteristic: It won both of these tournaments without ever scoring more points than the other player in a match! Look back scores either exactly the same number of points as the other player, or else slightly fewer (if the other player has managed to get in one more defection than Look back). Its success in the overall tournament derives from the fact that it elicits cooperative behavior from its opponent, enabling them both to do well.
(If Look back had been submitted as an additional entry in our tournament, however, it would have placed fifth -- behind all of the other nice players, but still ahead of all the non-nice ones.)
You can find a fuller account of both tournaments, with many fascinating details, in Axelrod's excellent book The evolution of cooperation (New York: Basic Books, Inc., 1984). There's also a World Wide Web site, http://pscs.physics.lsa.umich.edu/Software/CC/ECHome.html, that includes the FORTRAN implementation of Axelrod's second tournament.
The exercise for today, then, is to design and submit an improved strategy, taking account of the lessons learned from this first tournament. Send them to me by e-mail; I'll stage a second tournament and report on the results.
This document is available on the World Wide Web as
http://www.math.grin.edu/courses/Scheme/fall-1997/dilemma-project-2.html
created July 17, 1997
last revised October 3, 1997