Generalized Principal-Agent Problem with a Learning Agent
Tao Lin, Yiling Chen
TL;DR
This work develops a unified framework for generalized principal-agent problems with learning agents, removing the commitment assumption and analyzing how learning dynamics affect the principal’s payoff relative to the classical Stackelberg benchmark $U^*$. By reducing the learning setting to an approximate best-response problem, it derives tight bounds for both no-regret and no-swap-regret learners: the principal can typically achieve $U^* - \Theta\left(\sqrt{\frac{\mathrm{Reg}(T)}{T}}\right)$ against no-regret learners and at most $U^* + O\left(\frac{\mathrm{SReg}(T)}{T}\right)$ against no-swap-regret learners, with an intrinsic asymmetry between these regimes. The paper also shows that mean-based learning can enable the principal to exceed $U^*$ in some setups, and it provides problem-specific instantiations for Bayesian persuasion, Stackelberg games, and contract design, including explicit constants that depend on Lipschitz constants, diameters, and distance-to-boundary terms. Overall, the results unify and refine prior findings, quantify the limits of exploiting learning agents, and illuminate the role of information structure in learning-driven principal-agent interactions.
Abstract
In classic principal-agent problems such as Stackelberg games, contract design, and Bayesian persuasion, the agent best responds to the principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot problem where the agent approximately best responds, and prove that: (1) If the agent uses contextual no-regret learning algorithms with regret $\mathrm{Reg}(T)$, then the principal can guarantee utility at least $U^* - Θ\big(\sqrt{\tfrac{\mathrm{Reg}(T)}{T}}\big)$, where $U^*$ is the principal's optimal utility in the classic model with a best-responding agent. (2) If the agent uses contextual no-swap-regret learning algorithms with swap-regret $\mathrm{SReg}(T)$, then the principal cannot obtain utility more than $U^* + O(\frac{\mathrm{SReg(T)}}{T})$. (3) In addition, if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can sometimes do significantly better than $U^*$. These results not only refine previous works on Stackelberg games and contract design, but also lead to new results for Bayesian persuasion with a learning agent and all generalized principal-agent problems where the agent does not have private information.
