Table of Contents
Fetching ...

Cooperation and Control in Delegation Games

Oliver Sourbut, Lewis Hammond, Harriet Wood

TL;DR

This paper introduces delegation games as a multi-principal, multi-agent framework to study AI delegation problems, identifying two core failure modes: control and cooperation. It formalizes four measures—$IA$, $IC$, $CA$, and $CC$—to separately quantify alignment and capabilities at both individual and collective levels, and provides rigorous bounds linking these measures to principals' welfare. Theoretical results establish desiderata for the measures and bound principal welfare regret in terms of agent welfare and misalignment, while experiments demonstrate the bounds and show how the measures can be inferred from limited data. The work offers practical insight into designing safer and more beneficial AI systems by ensuring both good alignment and coordination among agents, and highlights challenges in estimating capabilities from observation alone.

Abstract

Many settings of interest involving humans and machines -- from virtual personal assistants to autonomous vehicles -- can naturally be modelled as principals (humans) delegating to agents (machines), which then interact with each other on their principals' behalf. We refer to these multi-principal, multi-agent scenarios as delegation games. In such games, there are two important failure modes: problems of control (where an agent fails to act in line their principal's preferences) and problems of cooperation (where the agents fail to work well together). In this paper we formalise and analyse these problems, further breaking them down into issues of alignment (do the players have similar preferences?) and capabilities (how competent are the players at satisfying those preferences?). We show -- theoretically and empirically -- how these measures determine the principals' welfare, how they can be estimated using limited observations, and thus how they might be used to help us design more aligned and cooperative AI systems.

Cooperation and Control in Delegation Games

TL;DR

This paper introduces delegation games as a multi-principal, multi-agent framework to study AI delegation problems, identifying two core failure modes: control and cooperation. It formalizes four measures—, , , and —to separately quantify alignment and capabilities at both individual and collective levels, and provides rigorous bounds linking these measures to principals' welfare. Theoretical results establish desiderata for the measures and bound principal welfare regret in terms of agent welfare and misalignment, while experiments demonstrate the bounds and show how the measures can be inferred from limited data. The work offers practical insight into designing safer and more beneficial AI systems by ensuring both good alignment and coordination among agents, and highlights challenges in estimating capabilities from observation alone.

Abstract

Many settings of interest involving humans and machines -- from virtual personal assistants to autonomous vehicles -- can naturally be modelled as principals (humans) delegating to agents (machines), which then interact with each other on their principals' behalf. We refer to these multi-principal, multi-agent scenarios as delegation games. In such games, there are two important failure modes: problems of control (where an agent fails to act in line their principal's preferences) and problems of cooperation (where the agents fail to work well together). In this paper we formalise and analyse these problems, further breaking them down into issues of alignment (do the players have similar preferences?) and capabilities (how competent are the players at satisfying those preferences?). We show -- theoretically and empirically -- how these measures determine the principals' welfare, how they can be estimated using limited observations, and thus how they might be used to help us design more aligned and cooperative AI systems.
Paper Structure (36 sections, 31 theorems, 55 equations, 13 figures)

This paper contains 36 sections, 31 theorems, 55 equations, 13 figures.

Key Result

Lemma 1

For any $u,u' \in U$, $u_\nu = u'_\nu$ if and only if $\preceq = \preceq'$.

Figures (13)

  • Figure 1: (a) The payoffs of the agents in Example \ref{['ex:driving']}; (b) the payoffs of the principals; and (c) a graphical representation, with vertical and horizontal arrows indicating control and cooperation, respectively.
  • Figure 2: The range of social welfares in a game $G$.
  • Figure 3: We report mean principal welfare (in red) normalised to $[\hat{w}_-,\hat{w}_+]$, with $\hat{w}_\bullet$ and $\hat{w}_\star$ in green. The lower bounds on welfare, given by Theorem \ref{['thm:capabilities-bound']}, and on $\hat{w}_\star$ (compared to $\hat{w}_+$), given by Proposition \ref{['prop:ideal-welfare-bound']}, are in orange and blue, respectively. Shaded areas show 90% confidence intervals.
  • Figure 4: We report the mean absolute error of estimates of the four measures. The red, orange, blue, and green lines represent games with $10^k$ outcomes for $k \in \{1,2,3,4\}$, respectively. Shaded areas show 90% confidence intervals.
  • Figure 5: (a) A Prisoner's Dilemma leading to arbitrarily high welfare regret for the principals, despite perfect control of each agent by its principal; (b) a game in which the welfare in the worst $\bm{\epsilon}\text{-NE}$ is much lower than in the worst NE.; and (c) a Traveller's Dilemma leading to arbitrarily high welfare regret for the principals, despite perfect control of each agent by its principal, and near-perfect collective alignment.
  • ...and 8 more figures

Theorems & Definitions (60)

  • Example 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Lemma 1
  • Definition 7
  • Definition 8
  • ...and 50 more