Table of Contents
Fetching ...

Test-then-Punish: A Statistical Approach to Repeated Games

Aymeric Capitaine, Antoine Scheid, Etienne Boursier, Alain Durmus, Michael I. Jordan

Abstract

We study discounted infinitely repeated games in which players agree on a cooperative mixed action profile but, at each step, observe only the realized pure actions. This form of imperfect monitoring breaks classical trigger strategies, since deviations cannot be identified with certainty. To address this problem, we study how hypothesis testing can be used to sustain cooperation. First, we develop a framework that embeds statistical inference directly into strategic behavior. We introduce relaxed equilibrium notions that allow players to ignore vanishing probability histories arising from rare but extreme realizations of the monitoring process. Within this framework, we formalize a generic test then punish strategy: players commit ex ante to a cooperative mixed action profile, continuously test whether observed play is consistent with this prescription, and permanently switch to punishment once sufficient statistical evidence of deviation accumulates. Under mild conditions on the testing procedure, this construction sustains any feasible and individually rational payoff for sufficiently patient players, yielding a Folk theorem type result under imperfect monitoring. We then propose two explicit implementations of this strategy. The first relies on anytime valid sequential tests, providing uniform control of Type I error over an infinite horizon and a finite expected detection time for payoff-relevant deviations. However, this strategy only accounts for stationary deviations and yields a Nash equilibrium. The second uses testing over batches with a fixed size, accommodating arbitrary deviations and achieving subgame perfect Nash equilibrium, at the cost of losing global anytime guarantees on false punishments.

Test-then-Punish: A Statistical Approach to Repeated Games

Abstract

We study discounted infinitely repeated games in which players agree on a cooperative mixed action profile but, at each step, observe only the realized pure actions. This form of imperfect monitoring breaks classical trigger strategies, since deviations cannot be identified with certainty. To address this problem, we study how hypothesis testing can be used to sustain cooperation. First, we develop a framework that embeds statistical inference directly into strategic behavior. We introduce relaxed equilibrium notions that allow players to ignore vanishing probability histories arising from rare but extreme realizations of the monitoring process. Within this framework, we formalize a generic test then punish strategy: players commit ex ante to a cooperative mixed action profile, continuously test whether observed play is consistent with this prescription, and permanently switch to punishment once sufficient statistical evidence of deviation accumulates. Under mild conditions on the testing procedure, this construction sustains any feasible and individually rational payoff for sufficiently patient players, yielding a Folk theorem type result under imperfect monitoring. We then propose two explicit implementations of this strategy. The first relies on anytime valid sequential tests, providing uniform control of Type I error over an infinite horizon and a finite expected detection time for payoff-relevant deviations. However, this strategy only accounts for stationary deviations and yields a Nash equilibrium. The second uses testing over batches with a fixed size, accommodating arbitrary deviations and achieving subgame perfect Nash equilibrium, at the cost of losing global anytime guarantees on false punishments.
Paper Structure (19 sections, 31 theorems, 238 equations, 2 figures, 1 table)

This paper contains 19 sections, 31 theorems, 238 equations, 2 figures, 1 table.

Key Result

Theorem 1

Assume perfect monitoring. Fix a payoff profile $\boldsymbol{v}=(v^1,\ldots, v^N)\in\mathcal{U} ^1\times \ldots\times\mathcal{U} ^N$ such that $v^i \geqslant \underline{u}^i$ for any $i\in[N]$. If $\beta \geqslant (\bar{u}^i - v^i)(\bar{u}^i - \underline{u}^i)^{-1}$ for any player $i$, then ${\bf s}

Figures (2)

  • Figure 1: If one player $i\in[N]$ picks a strategy $s^i$ in the green subset, then ${\bf s}=(s^i, s^{-i})\in\mathfrak{S}({\bf w}_{\boldsymbol{v}},\varepsilon)$.
  • Figure 2: Timeline of the batch test--then--punish strategy. Dots represent ends of batches.

Theorems & Definitions (57)

  • Definition 1: $(\varepsilon, \mathfrak{S})-$NE
  • Definition 2: $(\varepsilon, \delta)$-HP-SPNE
  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Proposition 2
  • Corollary 1
  • Corollary 2
  • Theorem 3
  • Proposition 3
  • ...and 47 more