Contracting with a Learning Agent
Guru Guruganesh, Yoav Kolumbus, Jon Schneider, Inbal Talgam-Cohen, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Joshua R. Wang, S. Matthew Weinberg
TL;DR
The paper studies repeated principal–agent problems where the agent uses learning dynamics, focusing on mean-based no-regret learners. It shows that against such learners, the optimal dynamic contract is a simple free-fall scheme: offer a linear contract with parameter $α$ for a stretch of rounds, then switch to the zero contract, yielding revenue during the fall and potentially improving welfare over the best static contract. The results extend to general contracts with single-dimensional scaling and illuminate welfare implications, including win–win scenarios; they also analyze how partial knowledge of the time horizon degrades the principal’s advantage. A continuous-time reduction underpins the discrete analysis and yields a polynomial-time computable strategy in the linear case, with extensions and limitations clarified for unknown horizons. Overall, the work bridges algorithmic contract theory with learning in games, revealing when simple dynamic contracts outperform static ones and how horizon uncertainty bounds such gains.
Abstract
Many real-life contractual relations differ completely from the clean, static model at the heart of principal-agent theory. Typically, they involve repeated strategic interactions of the principal and agent, taking place under uncertainty and over time. While appealing in theory, players seldom use complex dynamic strategies in practice, often preferring to circumvent complexity and approach uncertainty through learning. We initiate the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes. Optimizing against a no-regret agent is a known open problem in general games; we achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure. The solution has a surprisingly simple structure: for some $α> 0$, initially offer the agent a linear contract with scalar $α$, then switch to offering a linear contract with scalar $0$. This switch causes the agent to ``free-fall'' through their action space and during this time provides the principal with non-zero reward at zero cost. Despite apparent exploitation of the agent, this dynamic contract can leave \emph{both} players better off compared to the best static contract. Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically. Finally, we quantify the dependence of our results on knowledge of the time horizon, and are the first to address this consideration in the study of strategizing against learning agents.
