The Prisoners Dilemma

Article
03/07/2008

This week in my MBA, the instructor presented a question regarding the classical Prisoners Dilemma. In the prisoners dilemma two thieves are captured and separated so they cannot communicate. If neither prisoner confesses they will only be charged with 3 years each. If both confess they will get 5 years each. On the other hand, if one confesses and the other does not, the prisoner who confessed will walk free and the other thief will serve 10 years.

What is the best option for each thief?

We start by creating a payoff matrix:

	Prisoner A Confesses	Prisoner A does not Confess
Prisoner B Confesses	A 5 years; B 5 years	A 10 years; B free
Prisoner B does not confess	A free; B 10 years	A 3 years; B 3 years

Although it is apparent that the option to not confess is the best for both prisoners, it is not what happens because it is an unstable choice. The best option for both is to confess because one prisoner does not know what the other will do. As prisoner A does not know what B will do, if he chooses to not confess and B ends up confessing, B goes free and A takes 10 years. In order to minimize the risk of paying 10 years, the prisoner must confess.

If they could communicate and arrange a strategy, the best option would be to agree to not confess and be sure they will get only 3 years.

The best case scenario individually would be to confess and be sure that the other does not, but even if they could play a positive sum game, it would be very unlikely that either prisoner would agree to pay 10 years and let the other go free... unless there is a great amount of money hidden somewhere and the prisoner who goes free gets the money and hires a good lawyer to take the other one out of jail :-)

The interesting thing about the prisoner's dilemma as a strategy exercise is that no matter how you look at it, the best strategy is to betray the other prisoner and confess. In strategic scenarios, the decision maker will usually want to know what the other player will do, and this is not possible in the prisoner's dilemma because they can't communicate - one can trust his friend in the other room... but he cannot really predict what the other prisoner will do. Humans love freedom, and prisoner A, knowing that prisoner B would not confess anything (he trusts him), then the best option for Prisoner A is to betray B and confess so he walks free. So, confessing is the best option for A. Prisoner B can think the same and therefore the best strategic move for him is to confess. Now, if prisoner A knows that B would betray him (he does not trust B), then the best strategy is also to betray and confess, otherwise he might get 10 years while B goes free. Again, B may think the same and betray. From any standpoint, the prisoner should select to betray the other in order to preserve his own freedom - or at least part of it.

As written in Wikipedia (https://en.wikipedia.org/wiki/Prisoner%27s_dilemma), "rational self-interested play results in each prisoner being worse off than if they had stayed silent."

Mathematically, this is called Nash Equilibrium (remember Beautiful Mind?). Wikipedia gives a great coverage at https://en.wikipedia.org/wiki/Nash_Equilibrium:

"Stated simply, Amy and Bill are in Nash equilibrium if Amy is making the best decision she can, taking into account Bill's decision, and Bill is making the best decision he can, taking into account Amy's decision. Likewise, many players are in Nash equilibrium if each one is making the best decision that they can, taking into account the decisions of the others. However, Nash equilibrium does not necessarily mean the best cumulative payoff for all the players involved; in many cases all the players might improve their payoffs if they could somehow agree on strategies different from the Nash equilibrium"

The Prisoners Dilemma is a Nash Equilibrium where the best option strategically (and mathematically) is to both prisoners to "defect" instead of "collaborate" to each other. This option that gives equilibrium to the problem is not the best option globally because the best option is unstable: both prisoners remaining silent is better globally but it is unstable.”

In the prisoner dilemma, the defect strategy only works better if it is a one single instance. If it is a continued game, the cooperation is the best strategy; in fact, there are studies on this where even the alternate cooperation and betrayal is better strategy than always betrayal in the long run. For two companies which expect to be playing in the market for a long term, the betrayal strategy does not payoff in the long run.

Again, I will refer to Wikipedia:

"If two players play Prisoner's Dilemma more than once in succession (that is, having memory of at least one previous game), it is called iterated Prisoner's Dilemma. Amongst results shown by Nobel Prize winner Robert Aumann in his 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome. Popular interest in the iterated prisoners dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In this he reports on a tournament he organized in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity; initial hostility; capacity for forgiveness; and so forth.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more altruistic strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by natural selection."

The Prisoners Dilemma

Additional resources