Table of Contents
Fetching ...

Cheap Talking Algorithms

Daniele Condorelli, Massimiliano Furlan

TL;DR

This paper investigates how simple memoryless reinforcement-learning agents play the Crawford–Sobel cheap-talk game in a static, large-population setting. It shows that agents converge to Bayes–Nash equilibria with substantial information transmission when bias is low, and that informativeness declines as bias increases, with equilibria near Pareto-optimal or second-best predictions at intermediate bias. The results hold across a range of hyperparameters and game forms, and the equilibrium selection is largely governed by monotone partitional equilibria, transitioning from the most informative to less informative as $b$ grows. The work has implications for AI agents in strategic settings, suggesting that communication can persist and even shape market-like outcomes, and it outlines several avenues for extending the analysis to population dynamics, networks, and human–AI interactions.

Abstract

We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.

Cheap Talking Algorithms

TL;DR

This paper investigates how simple memoryless reinforcement-learning agents play the Crawford–Sobel cheap-talk game in a static, large-population setting. It shows that agents converge to Bayes–Nash equilibria with substantial information transmission when bias is low, and that informativeness declines as bias increases, with equilibria near Pareto-optimal or second-best predictions at intermediate bias. The results hold across a range of hyperparameters and game forms, and the equilibrium selection is largely governed by monotone partitional equilibria, transitioning from the most informative to less informative as grows. The work has implications for AI agents in strategic settings, suggesting that communication can persist and even shape market-like outcomes, and it outlines several avenues for extending the analysis to population dynamics, networks, and human–AI interactions.

Abstract

We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.
Paper Structure (9 sections, 4 equations, 10 figures, 1 algorithm)

This paper contains 9 sections, 4 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Top: Maximum probability mass that the policy of the sender (receiver) places on suboptimal messages (actions) across all types (and messages). Bottom: Potential ex-ante gain by a unilateral deviation across all types (and messages). Averages over $1000$ simulations. Also applies to subsequent figures: The ex-ante optimal equilibrium entails perfect information transmission for biases identified by the shaded grey area to the left, while babbling is the unique equilibrium for biases in the shaded grey areas to the right.
  • Figure 2: Frequency of simulations in which both agents place at most $0.01$ probability mass on suboptimal actions across all states, for different levels of bias in $[0,0.5]$. Each graph has bias on the horizontal axis and frequency on the vertical axis, and corresponds to a specific $(\lambda, \alpha)$ combination of hyperparameters. Horizontal dashed lines indicate frequencies of 0 and 1.
  • Figure 3: Heathmap of the modal policies of sender (top) and receiver (top) for different levels of bias over 1000 independent simulations. All vertical pairs of strategies correspond to exact equilibria. Randomisation over messages is with equal probability as indicated by the same colour tone. Messages to the right of the dashed line are off the equilibrium path. To find the modal policy we relabeled messages in each simulation assigning a natural number to each message such that messages with smaller numbers are associated with smaller types.
  • Figure 4: Ex-ante expected reward for the sender (left) and receiver (right) for different levels of bias. The distribution of values of 1000 simulations is shown in shades of blue. The value associated with the ex-ante optimal equilibrium is in red and the one associated with the babbling equilibrium is dotted gray.
  • Figure 5: Normalised mutual information between the distribution of messages induced by the sender's policy and the distribution of sender's types. The distribution over 1000 simulations is shown in shades of blue. The value associated with the optimal equilibrium is in red and the one associated with the worst equilibrium is dotted gray.
  • ...and 5 more figures