Table of Contents
Fetching ...

Learning a Game by Paying the Agents

Brian Hu Zhang, Tao Lin, Yiling Chen, Tuomas Sandholm

TL;DR

The paper tackles learning agents’ utility functions from repeated, non-equilibrium play in normal-form games by introducing a principal who can signal and pay agents. It analyzes two behavioral models—rationalizable actions and no-regret learners—providing near-optimal upper and matching lower bounds on the rounds needed to ε-learn utilities up to additive agent-offsets and showing how signaling enables learning in the no-regret setting. It additionally develops a framework for steering learning agents to desirable equilibria via correlated equilibria with payments (CEP), achieving principal-optimal outcomes without prior knowledge of utilities. The results bridge inverse game theory and decision-making under payments, offering algorithms with polynomial runtime and a principled path to practical steering of adaptive agents.

Abstract

We study the problem of learning the utility functions of agents in a normal-form game by observing the agents play the game repeatedly. Differing from most prior literature, we introduce a principal with the power to observe the agents playing the game, send the agents signals, and send the agents payments as a function of their actions. Under reasonable behavioral models for the agents such as iterated dominated action removal or a no-regret assumption, we show that the principal can, using a number of rounds polynomial in the size of the game, learn the utility functions of all agents to any desirable precision $\varepsilon > 0$. We also show lower bounds in both models, which nearly match the upper bounds in the former model and also strictly separate the two models: the principal can learn strictly faster in the iterated dominance model. Finally, we discuss implications for the problem of steering agents to a desired equilibrium: in particular, we introduce, using our utility-learning algorithm as a subroutine, the first algorithm for steering learning agents without prior knowledge of their utilities.

Learning a Game by Paying the Agents

TL;DR

The paper tackles learning agents’ utility functions from repeated, non-equilibrium play in normal-form games by introducing a principal who can signal and pay agents. It analyzes two behavioral models—rationalizable actions and no-regret learners—providing near-optimal upper and matching lower bounds on the rounds needed to ε-learn utilities up to additive agent-offsets and showing how signaling enables learning in the no-regret setting. It additionally develops a framework for steering learning agents to desirable equilibria via correlated equilibria with payments (CEP), achieving principal-optimal outcomes without prior knowledge of utilities. The results bridge inverse game theory and decision-making under payments, offering algorithms with polynomial runtime and a principled path to practical steering of adaptive agents.

Abstract

We study the problem of learning the utility functions of agents in a normal-form game by observing the agents play the game repeatedly. Differing from most prior literature, we introduce a principal with the power to observe the agents playing the game, send the agents signals, and send the agents payments as a function of their actions. Under reasonable behavioral models for the agents such as iterated dominated action removal or a no-regret assumption, we show that the principal can, using a number of rounds polynomial in the size of the game, learn the utility functions of all agents to any desirable precision . We also show lower bounds in both models, which nearly match the upper bounds in the former model and also strictly separate the two models: the principal can learn strictly faster in the iterated dominance model. Finally, we discuss implications for the problem of steering agents to a desired equilibrium: in particular, we introduce, using our utility-learning algorithm as a subroutine, the first algorithm for steering learning agents without prior knowledge of their utilities.

Paper Structure

This paper contains 44 sections, 23 theorems, 26 equations, 6 algorithms.

Key Result

Theorem 1.1

In the rationalizable model, there exists an algorithm for the principal that can learn a game to precision $\varepsilon$ in ${\mathcal{O}}(nM \log(1/\varepsilon))$ rounds. This is tight up to $\log(M)$ factors.

Theorems & Definitions (43)

  • Theorem 1.1: Informal summary of results in the rationalizable model
  • Theorem 1.2: Informal summary of results in the no-regret model
  • Proposition 1.3: Informal version of \ref{['th:1p-payment']}
  • Proposition 1.4: Informal version of \ref{['th:mp-payment']}
  • Theorem 1.5: Informal summary of steering results
  • Proposition 2.1
  • Proposition 3.1
  • proof
  • Theorem 4.1
  • Theorem 4.2
  • ...and 33 more