Learning a Game by Paying the Agents
Brian Hu Zhang, Tao Lin, Yiling Chen, Tuomas Sandholm
TL;DR
The paper tackles learning agents’ utility functions from repeated, non-equilibrium play in normal-form games by introducing a principal who can signal and pay agents. It analyzes two behavioral models—rationalizable actions and no-regret learners—providing near-optimal upper and matching lower bounds on the rounds needed to ε-learn utilities up to additive agent-offsets and showing how signaling enables learning in the no-regret setting. It additionally develops a framework for steering learning agents to desirable equilibria via correlated equilibria with payments (CEP), achieving principal-optimal outcomes without prior knowledge of utilities. The results bridge inverse game theory and decision-making under payments, offering algorithms with polynomial runtime and a principled path to practical steering of adaptive agents.
Abstract
We study the problem of learning the utility functions of agents in a normal-form game by observing the agents play the game repeatedly. Differing from most prior literature, we introduce a principal with the power to observe the agents playing the game, send the agents signals, and send the agents payments as a function of their actions. Under reasonable behavioral models for the agents such as iterated dominated action removal or a no-regret assumption, we show that the principal can, using a number of rounds polynomial in the size of the game, learn the utility functions of all agents to any desirable precision $\varepsilon > 0$. We also show lower bounds in both models, which nearly match the upper bounds in the former model and also strictly separate the two models: the principal can learn strictly faster in the iterated dominance model. Finally, we discuss implications for the problem of steering agents to a desired equilibrium: in particular, we introduce, using our utility-learning algorithm as a subroutine, the first algorithm for steering learning agents without prior knowledge of their utilities.
