Generalized Principal-Agent Problem with a Learning Agent

Tao Lin; Yiling Chen

Generalized Principal-Agent Problem with a Learning Agent

Tao Lin, Yiling Chen

TL;DR

This work develops a unified framework for generalized principal-agent problems with learning agents, removing the commitment assumption and analyzing how learning dynamics affect the principal’s payoff relative to the classical Stackelberg benchmark $U^*$. By reducing the learning setting to an approximate best-response problem, it derives tight bounds for both no-regret and no-swap-regret learners: the principal can typically achieve $U^* - \Theta\left(\sqrt{\frac{\mathrm{Reg}(T)}{T}}\right)$ against no-regret learners and at most $U^* + O\left(\frac{\mathrm{SReg}(T)}{T}\right)$ against no-swap-regret learners, with an intrinsic asymmetry between these regimes. The paper also shows that mean-based learning can enable the principal to exceed $U^*$ in some setups, and it provides problem-specific instantiations for Bayesian persuasion, Stackelberg games, and contract design, including explicit constants that depend on Lipschitz constants, diameters, and distance-to-boundary terms. Overall, the results unify and refine prior findings, quantify the limits of exploiting learning agents, and illuminate the role of information structure in learning-driven principal-agent interactions.

Abstract

In classic principal-agent problems such as Stackelberg games, contract design, and Bayesian persuasion, the agent best responds to the principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot problem where the agent approximately best responds, and prove that: (1) If the agent uses contextual no-regret learning algorithms with regret $\mathrm{Reg}(T)$, then the principal can guarantee utility at least $U^* - Θ\big(\sqrt{\tfrac{\mathrm{Reg}(T)}{T}}\big)$, where $U^*$ is the principal's optimal utility in the classic model with a best-responding agent. (2) If the agent uses contextual no-swap-regret learning algorithms with swap-regret $\mathrm{SReg}(T)$, then the principal cannot obtain utility more than $U^* + O(\frac{\mathrm{SReg(T)}}{T})$. (3) In addition, if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can sometimes do significantly better than $U^*$. These results not only refine previous works on Stackelberg games and contract design, but also lead to new results for Bayesian persuasion with a learning agent and all generalized principal-agent problems where the agent does not have private information.

Generalized Principal-Agent Problem with a Learning Agent

TL;DR

. By reducing the learning setting to an approximate best-response problem, it derives tight bounds for both no-regret and no-swap-regret learners: the principal can typically achieve

against no-regret learners and at most

against no-swap-regret learners, with an intrinsic asymmetry between these regimes. The paper also shows that mean-based learning can enable the principal to exceed

in some setups, and it provides problem-specific instantiations for Bayesian persuasion, Stackelberg games, and contract design, including explicit constants that depend on Lipschitz constants, diameters, and distance-to-boundary terms. Overall, the results unify and refine prior findings, quantify the limits of exploiting learning agents, and illuminate the role of information structure in learning-driven principal-agent interactions.

Abstract

, then the principal can guarantee utility at least

, where

is the principal's optimal utility in the classic model with a best-responding agent. (2) If the agent uses contextual no-swap-regret learning algorithms with swap-regret

, then the principal cannot obtain utility more than

. (3) In addition, if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can sometimes do significantly better than

. These results not only refine previous works on Stackelberg games and contract design, but also lead to new results for Bayesian persuasion with a learning agent and all generalized principal-agent problems where the agent does not have private information.

Paper Structure (28 sections, 19 theorems, 89 equations, 1 table, 1 algorithm)

This paper contains 28 sections, 19 theorems, 89 equations, 1 table, 1 algorithm.

Introduction
Related Works
Generalized Principal-Agent Problem with a Learning Agent
Generalized Principal-Agent Problem
Learning Agent
Special Case: Bayesian Persuasion with a Learning Agent
Reduction from Learning to Approximate Best Response
Generalized Principal-Agent Problem with Approximate Best Response
Agent's No-Regret Learning: Lower Bound on Principal's Utility
Agent's No-Swap-Regret Learning: Upper Bound on Principal's Utility
Agent's Mean-Based Learning: Exploitable by the Principal
Generalized Principal-Agent Problems with Approximate Best Response
Applications to Specific Principal-Agent Problems
Bayesian Persuasion
Stackelberg Games
...and 13 more sections

Key Result

Proposition 2.1

There exist learning algorithms with contextual regret $\mathrm{CReg}(T) = O(\sqrt{|A| |S| T})$ and contextual swap-regret $\mathrm{CSReg}(T) = O(|A| \sqrt{|S| T})$. They can be constructed by running an ordinary no-(swap-)regret multi-armed bandit algorithm for each context independently.

Theorems & Definitions (40)

Definition 2.1
Proposition 2.1
Example 3.1
Theorem 3.1
Lemma 3.2
proof
proof : Proof of Theorem \ref{['thm:no-regret-lower-bound']}
Proposition 3.3
Theorem 3.4
proof
...and 30 more

Generalized Principal-Agent Problem with a Learning Agent

TL;DR

Abstract

Generalized Principal-Agent Problem with a Learning Agent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (40)