Table of Contents
Fetching ...

Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability

Natalie Collina, Varun Gupta, Aaron Roth

TL;DR

This work addresses repeated Principal–Agent contracting with multiple non-myopic Agents by modeling the induced extensive-form game and establishing robust counterfactual guarantees for the Principal. It proves the existence of pure, non-responsive equilibria in the Agent-selection game and shows that a monotone, externally regret-bounded bandit algorithm yields no regret to the best fixed Agent in hindsight, even under counterfactual fixed contracts. It further demonstrates that swap-regret guarantees combined with monotone selection enable limited-liability contracts while preserving counterfactual no-regret properties, via a tab/debt mechanism and appropriate linear benchmarks. The paper also introduces a monotone bandit no-swap-regret algorithm (MonoBandit) and shows it can instantiate the theoretical guarantees, offering practical mechanisms for learning-driven contract design in competitive multi-agent settings.

Abstract

We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed $T$-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight -- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature.

Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability

TL;DR

This work addresses repeated Principal–Agent contracting with multiple non-myopic Agents by modeling the induced extensive-form game and establishing robust counterfactual guarantees for the Principal. It proves the existence of pure, non-responsive equilibria in the Agent-selection game and shows that a monotone, externally regret-bounded bandit algorithm yields no regret to the best fixed Agent in hindsight, even under counterfactual fixed contracts. It further demonstrates that swap-regret guarantees combined with monotone selection enable limited-liability contracts while preserving counterfactual no-regret properties, via a tab/debt mechanism and appropriate linear benchmarks. The paper also introduces a monotone bandit no-swap-regret algorithm (MonoBandit) and shows it can instantiate the theoretical guarantees, offering practical mechanisms for learning-driven contract design in competitive multi-agent settings.

Abstract

We study a repeated contracting setting in which a Principal adaptively chooses amongst Agents at each of rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a -round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed -round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight -- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature.
Paper Structure (33 sections, 29 theorems, 63 equations, 5 algorithms)

This paper contains 33 sections, 29 theorems, 63 equations, 5 algorithms.

Key Result

Theorem 1

There exists a pure non-responsive Nash equilibrium of the Agent Selection Game (Game setting:Agent-selection).

Theorems & Definitions (95)

  • Definition 1: Outcome $o$ and Return $r(o)$
  • Definition 2: Contract
  • Definition 3: Agent Action and Cost Function
  • Definition 4: Outcome Distribution $\mathcal{D}_{i, y, a}$
  • Remark 1
  • Definition 5: Expected Single Round Agent Utility $u_{i}$
  • Definition 6: Expected Single Round Principal Utility $u_{p}$
  • Definition 7: Principal Transcript
  • Definition 8: Principal Selection Mechanism $f$
  • Definition 9: Belief functions $B_i(\cdot)$
  • ...and 85 more