Table of Contents
Fetching ...

Spontaneous Giving and Calculated Greed in Language Models

Yuxuan Li, Hirokazu Shirado

TL;DR

The paper investigates whether advanced reasoning in large language models extends to social intelligence in cooperative contexts. Using chain-of-thought prompting and reflection across six economic games with multiple model families, it assesses cooperation and norm enforcement under uncertainty. It finds that reasoning models consistently reduce cooperation and punishment, leading to lower group performance in iterated interactions. The work argues for AI architectures that integrate social intelligence with reasoning to avoid promoting selfish defection and to support effective collective action.

Abstract

Large language models demonstrate strong problem-solving abilities through reasoning techniques such as chain-of-thought prompting and reflection. However, it remains unclear whether these reasoning capabilities extend to a form of social intelligence: making effective decisions in cooperative contexts. We examine this question using economic games that simulate social dilemmas. First, we apply chain-of-thought and reflection prompting to GPT-4o in a Public Goods Game. We then evaluate multiple off-the-shelf models across six cooperation and punishment games, comparing those with and without explicit reasoning mechanisms. We find that reasoning models consistently reduce cooperation and norm enforcement, favoring individual rationality. In repeated interactions, groups with more reasoning agents exhibit lower collective gains. These behaviors mirror human patterns of "spontaneous giving and calculated greed." Our findings underscore the need for LLM architectures that incorporate social intelligence alongside reasoning, to help address--rather than reinforce--the challenges of collective action.

Spontaneous Giving and Calculated Greed in Language Models

TL;DR

The paper investigates whether advanced reasoning in large language models extends to social intelligence in cooperative contexts. Using chain-of-thought prompting and reflection across six economic games with multiple model families, it assesses cooperation and norm enforcement under uncertainty. It finds that reasoning models consistently reduce cooperation and punishment, leading to lower group performance in iterated interactions. The work argues for AI architectures that integrate social intelligence with reasoning to avoid promoting selfish defection and to support effective collective action.

Abstract

Large language models demonstrate strong problem-solving abilities through reasoning techniques such as chain-of-thought prompting and reflection. However, it remains unclear whether these reasoning capabilities extend to a form of social intelligence: making effective decisions in cooperative contexts. We examine this question using economic games that simulate social dilemmas. First, we apply chain-of-thought and reflection prompting to GPT-4o in a Public Goods Game. We then evaluate multiple off-the-shelf models across six cooperation and punishment games, comparing those with and without explicit reasoning mechanisms. We find that reasoning models consistently reduce cooperation and norm enforcement, favoring individual rationality. In repeated interactions, groups with more reasoning agents exhibit lower collective gains. These behaviors mirror human patterns of "spontaneous giving and calculated greed." Our findings underscore the need for LLM architectures that incorporate social intelligence alongside reasoning, to help address--rather than reinforce--the challenges of collective action.

Paper Structure

This paper contains 39 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Dual-process hypothesis for cooperation in humans and LLMs. Deliberative “System 2” reasoning may suppress cooperation that would otherwise arise from intuitive “System 1” processes.
  • Figure 2: Economic games used. Cooperation games ask players whether to incur a cost to benefit others, while punishment games ask whether to incur a cost to impose a cost on non-cooperators. In each scenario, the language model assumes the role of Player A.
  • Figure 3: Reasoning reduces cooperation in the Public Goods Game. Cooperation rate is defined as the fraction of trials (out of 100) in which GPT-4o chooses to cooperate. (a) Cooperation declines as the number of reasoning steps increases; the dashed line represents a fitted trend. The no-reasoning baseline corresponds to one reasoning step. (b) Cooperation also decreases when the model is prompted to reflect and revise its initial decision.
  • Figure 4: Comparison of cooperation and punishment outcomes between GPT-4o and o1. Horizontal lines in the Dictator Game and Ultimatum Game panels indicate the means of the respective distributions. These visualizations correspond to the results reported in Table \ref{['tab:model_comparison']}.
  • Figure 5: Groups cooperate and earn less as the proportion of reasoning models increases. Changes in cooperation rate (a) and total earned points (b) across rounds in iterated Public Goods Games are shown (100 runs per condition). Error bars represent the mean $\pm$ s.e.m.
  • ...and 6 more figures