Table of Contents
Fetching ...

Contracting with a Learning Agent

Guru Guruganesh, Yoav Kolumbus, Jon Schneider, Inbal Talgam-Cohen, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Joshua R. Wang, S. Matthew Weinberg

TL;DR

The paper studies repeated principal–agent problems where the agent uses learning dynamics, focusing on mean-based no-regret learners. It shows that against such learners, the optimal dynamic contract is a simple free-fall scheme: offer a linear contract with parameter $α$ for a stretch of rounds, then switch to the zero contract, yielding revenue during the fall and potentially improving welfare over the best static contract. The results extend to general contracts with single-dimensional scaling and illuminate welfare implications, including win–win scenarios; they also analyze how partial knowledge of the time horizon degrades the principal’s advantage. A continuous-time reduction underpins the discrete analysis and yields a polynomial-time computable strategy in the linear case, with extensions and limitations clarified for unknown horizons. Overall, the work bridges algorithmic contract theory with learning in games, revealing when simple dynamic contracts outperform static ones and how horizon uncertainty bounds such gains.

Abstract

Many real-life contractual relations differ completely from the clean, static model at the heart of principal-agent theory. Typically, they involve repeated strategic interactions of the principal and agent, taking place under uncertainty and over time. While appealing in theory, players seldom use complex dynamic strategies in practice, often preferring to circumvent complexity and approach uncertainty through learning. We initiate the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes. Optimizing against a no-regret agent is a known open problem in general games; we achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure. The solution has a surprisingly simple structure: for some $α> 0$, initially offer the agent a linear contract with scalar $α$, then switch to offering a linear contract with scalar $0$. This switch causes the agent to ``free-fall'' through their action space and during this time provides the principal with non-zero reward at zero cost. Despite apparent exploitation of the agent, this dynamic contract can leave \emph{both} players better off compared to the best static contract. Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically. Finally, we quantify the dependence of our results on knowledge of the time horizon, and are the first to address this consideration in the study of strategizing against learning agents.

Contracting with a Learning Agent

TL;DR

The paper studies repeated principal–agent problems where the agent uses learning dynamics, focusing on mean-based no-regret learners. It shows that against such learners, the optimal dynamic contract is a simple free-fall scheme: offer a linear contract with parameter for a stretch of rounds, then switch to the zero contract, yielding revenue during the fall and potentially improving welfare over the best static contract. The results extend to general contracts with single-dimensional scaling and illuminate welfare implications, including win–win scenarios; they also analyze how partial knowledge of the time horizon degrades the principal’s advantage. A continuous-time reduction underpins the discrete analysis and yields a polynomial-time computable strategy in the linear case, with extensions and limitations clarified for unknown horizons. Overall, the work bridges algorithmic contract theory with learning in games, revealing when simple dynamic contracts outperform static ones and how horizon uncertainty bounds such gains.

Abstract

Many real-life contractual relations differ completely from the clean, static model at the heart of principal-agent theory. Typically, they involve repeated strategic interactions of the principal and agent, taking place under uncertainty and over time. While appealing in theory, players seldom use complex dynamic strategies in practice, often preferring to circumvent complexity and approach uncertainty through learning. We initiate the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes. Optimizing against a no-regret agent is a known open problem in general games; we achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure. The solution has a surprisingly simple structure: for some , initially offer the agent a linear contract with scalar , then switch to offering a linear contract with scalar . This switch causes the agent to ``free-fall'' through their action space and during this time provides the principal with non-zero reward at zero cost. Despite apparent exploitation of the agent, this dynamic contract can leave \emph{both} players better off compared to the best static contract. Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically. Finally, we quantify the dependence of our results on knowledge of the time horizon, and are the first to address this consideration in the study of strategizing against learning agents.
Paper Structure (29 sections, 23 theorems, 33 equations, 9 figures)

This paper contains 29 sections, 23 theorems, 33 equations, 9 figures.

Key Result

Theorem 1

In success/failure settings, as well as in arbitrary contract settings where the principal restricts to linear contracts, the optimal dynamic contract against a mean-based agent is a free-fall contract. This optimal dynamic contract can be efficiently computed.

Figures (9)

  • Figure 1: A canonical contract setting in which a simple dynamic contract extracts higher expected revenue than the best static contract. The table entries show the outcome probabilities given the actions.
  • Figure 2: Two representations of the same dynamic contract, as applied to the contract setting described in Figure \ref{['fig:example']} and repeated for $T$ steps. The dotted red curve in Figure \ref{['fig:lin-contract-curve_a']} describes the cumulative contract at time $t$ as a function of $t$, where both axes are normalized by $T$. The shaded areas represent the mean-based best-response regions for the agent: when the cumulative contract is within the purple, green, and gray regions, the learning agent prefers action $3$, action $2$, and $1$, respectively. The lines $\alpha_{1,2}$ and $\alpha_{2,3}$ are the indifference curves between these regions. Figure \ref{['fig:lin-contract-curve_b']} shows the same dynamic contract, but this time the dotted red curve describes the average contract at time $t$ as a function of the fraction of total time $t/T$. Pictorially, after steadily building the agent's incentives until time $T/2$, the principal's offered contract "flatlines", and this causes the agent to "free-fall" through the shaded regions during the remaining time.
  • Figure 3: An illustration of Lemma \ref{['thm:rewriting-lemma-1']}. The plot shows the cumulative contract over time for the contract game shown in Figure \ref{['fig:example']}, repeated to $T$ steps, where both axes are normalized by $T$. The lemma shows how arbitrary dynamic strategies, as the one shown in the blue curve, can be re-written as piecewise stationary strategies, depicted in the dotted red curve, inducing similar behavior by the agent and the same utilities.
  • Figure 4: Illustrations for the proof of Lemma \ref{['lem:decreasing-linear']}.
  • Figure 5: Illustrations for the proof of Lemma \ref{['lem:decreasing-to-freefall']}.
  • ...and 4 more figures

Theorems & Definitions (56)

  • Theorem : See Theorem \ref{['thm:free-fall-linear']} in Section \ref{['sec:linear-free-fall-proof']}, combined with the reduction in \ref{['thm:discrete_to_continuous']}
  • Theorem : See Theorem \ref{['thm:unbounded-win-win']} in Section \ref{['sec:utility-implications-linear']}
  • Theorem : See Theorem \ref{['thm:p-linear']} in Section \ref{['sec:one-d-contracts']}, combined with the reduction in \ref{['thm:discrete_to_continuous']}
  • Theorem : See Theorems \ref{['thm:unknown-time-horizon']}-\ref{['thm:unknown-horizon-converse']} in Section \ref{['sec:unknown-horizon']}
  • Definition 2.1
  • Definition 2.2: BravermanMaoSchneiderWeinberg2018selling
  • Definition 2.3
  • Theorem 2.4
  • Theorem 3.1
  • Lemma 3.2
  • ...and 46 more