Table of Contents
Fetching ...

Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents

Cristian Chica, Yinglong Guo, Gilad Lerman

TL;DR

This work provides a theoretical explanation for why Q-learning agents may learn supracompetitive pricing in infinite-repeated settings by developing a stochastic-game model with bounded memory and a bounded-experimentation variant of Q-learning. It establishes the existence of one-memory subgame perfect equilibria (SPEs) via a fixed-point analysis of the V1 operator and demonstrates how grim-trigger, naive collusion, or increasing strategies can support such outcomes, depending on the price structure and payoffs. The analysis centers on a dynamic Bertrand-like environment with a competition price $p^*$ and a collusive-enabling price $p^C$, showing that grim-trigger policies can constitute SPEs while naive collusion generally cannot. The results offer a rigorous mechanism linking learning dynamics to sustained supracompetitive pricing, with implications for regulation and future research on algorithmic competition in dynamic, bounded-recall settings.

Abstract

There is growing experimental evidence that $Q$-learning agents may learn to charge supracompetitive prices. We provide the first theoretical explanation for this behavior in infinite repeated games. Firms update their pricing policies based solely on observed profits, without computing equilibrium strategies. We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, and when the $Q$-function satisfies certain inequalities at the end of experimentation, firms learn to consistently charge supracompetitive prices. We introduce a new class of one-memory subgame perfect equilibria (SPEs) and provide conditions under which learned behavior is supported by naive collusion, grim trigger policies, or increasing strategies. Naive collusion does not constitute an SPE unless the collusive-enabling price is a one-stage Nash equilibrium, whereas grim trigger policies can.

Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents

TL;DR

This work provides a theoretical explanation for why Q-learning agents may learn supracompetitive pricing in infinite-repeated settings by developing a stochastic-game model with bounded memory and a bounded-experimentation variant of Q-learning. It establishes the existence of one-memory subgame perfect equilibria (SPEs) via a fixed-point analysis of the V1 operator and demonstrates how grim-trigger, naive collusion, or increasing strategies can support such outcomes, depending on the price structure and payoffs. The analysis centers on a dynamic Bertrand-like environment with a competition price and a collusive-enabling price , showing that grim-trigger policies can constitute SPEs while naive collusion generally cannot. The results offer a rigorous mechanism linking learning dynamics to sustained supracompetitive pricing, with implications for regulation and future research on algorithmic competition in dynamic, bounded-recall settings.

Abstract

There is growing experimental evidence that -learning agents may learn to charge supracompetitive prices. We provide the first theoretical explanation for this behavior in infinite repeated games. Firms update their pricing policies based solely on observed profits, without computing equilibrium strategies. We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, and when the -function satisfies certain inequalities at the end of experimentation, firms learn to consistently charge supracompetitive prices. We introduce a new class of one-memory subgame perfect equilibria (SPEs) and provide conditions under which learned behavior is supported by naive collusion, grim trigger policies, or increasing strategies. Naive collusion does not constitute an SPE unless the collusive-enabling price is a one-stage Nash equilibrium, whereas grim trigger policies can.

Paper Structure

This paper contains 20 sections, 16 theorems, 138 equations, 3 algorithms.

Key Result

Proposition 1

Let $i\in[n]$ and $\boldsymbol{\sigma}_1 = (\sigma^i_1, \boldsymbol{\sigma}^{-i}_1)\in \boldsymbol{\Sigma}_1$. For each $(s_1,\boldsymbol{p}_0)\in \mathcal{S}\times \mathcal{A}^n$, $\tilde{V}_{1}^i(s_1, \boldsymbol{p}_0, \sigma^i_1|\boldsymbol{\sigma}^{-i}_1)$ satisfies the following Bellman Equatio Moreover, the system of $rM$ equations given by Bellman_equationV1 has a unique solution in the $rM

Theorems & Definitions (19)

  • Proposition 1: Lemma 1 of fink1964equilibrium
  • Theorem 1: Existence of stationary points with special properties fink1964equilibrium
  • Theorem 2: Existence of Nash Equilibrium from time $t=1$
  • Theorem 3: Existence of the one-memory SPE
  • Proposition 2: The grim trigger strategy is a one-memory SPE
  • Proposition 3: $Q_{\!f}^{i}$ captures the value of the game at time $t=1$
  • Proposition 4: Sufficient condition for $\boldsymbol{Q}_{\!f}$ to induce a Nash equilibrium from time $t=1$
  • Theorem 4: $Q$-learning convergence to supracompetitive prices
  • Proposition 5: Naive Collusion
  • Proposition 6: Grim Trigger Collusion
  • ...and 9 more