Table of Contents
Fetching ...

Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

Andreas A. Haupt, Phillip J. K. Christoffersen, Mehul Damani, Dylan Hadfield-Menell

TL;DR

Social dilemmas in MARL arise when individual incentives misalign with collective welfare. The authors introduce Formal Contracting, augmenting Markov games with binding, state-dependent reward transfers governed by contractible observations, enabling agents to commit to socially beneficial outcomes. They prove that, with sufficiently expressive contract spaces, every subgame-perfect equilibrium attains socially optimal welfare in fully observable settings, and that greater expressiveness raises welfare under partial observability; they further propose MOCA to stabilize contract learning and demonstrate effectiveness across static and dynamic domains. The work also connects contracting to DEC-POMDPs under history-transparency and highlights the necessity of arbitrary unconditional transfers for welfare gains. Overall, it provides a principled mechanism to induce cooperation among selfish agents and offers scalable learning strategies for contract design in diverse MARL environments.

Abstract

Multi-agent Reinforcement Learning (MARL) is a powerful tool for training autonomous agents acting independently in a common environment. However, it can lead to sub-optimal behavior when individual incentives and group incentives diverge. Humans are remarkably capable at solving these social dilemmas. It is an open problem in MARL to replicate such cooperative behaviors in selfish agents. In this work, we draw upon the idea of formal contracting from economics to overcome diverging incentives between agents in MARL. We propose an augmentation to a Markov game where agents voluntarily agree to binding transfers of reward, under pre-specified conditions. Our contributions are theoretical and empirical. First, we show that this augmentation makes all subgame-perfect equilibria of all Fully Observable Markov Games exhibit socially optimal behavior, given a sufficiently rich space of contracts. Next, we show that for general contract spaces, and even under partial observability, richer contract spaces lead to higher welfare. Hence, contract space design solves an exploration-exploitation tradeoff, sidestepping incentive issues. We complement our theoretical analysis with experiments. Issues of exploration in the contracting augmentation are mitigated using a training methodology inspired by multi-objective reinforcement learning: Multi-Objective Contract Augmentation Learning (MOCA). We test our methodology in static, single-move games, as well as dynamic domains that simulate traffic, pollution management and common pool resource management.

Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL

TL;DR

Social dilemmas in MARL arise when individual incentives misalign with collective welfare. The authors introduce Formal Contracting, augmenting Markov games with binding, state-dependent reward transfers governed by contractible observations, enabling agents to commit to socially beneficial outcomes. They prove that, with sufficiently expressive contract spaces, every subgame-perfect equilibrium attains socially optimal welfare in fully observable settings, and that greater expressiveness raises welfare under partial observability; they further propose MOCA to stabilize contract learning and demonstrate effectiveness across static and dynamic domains. The work also connects contracting to DEC-POMDPs under history-transparency and highlights the necessity of arbitrary unconditional transfers for welfare gains. Overall, it provides a principled mechanism to induce cooperation among selfish agents and offers scalable learning strategies for contract design in diverse MARL environments.

Abstract

Multi-agent Reinforcement Learning (MARL) is a powerful tool for training autonomous agents acting independently in a common environment. However, it can lead to sub-optimal behavior when individual incentives and group incentives diverge. Humans are remarkably capable at solving these social dilemmas. It is an open problem in MARL to replicate such cooperative behaviors in selfish agents. In this work, we draw upon the idea of formal contracting from economics to overcome diverging incentives between agents in MARL. We propose an augmentation to a Markov game where agents voluntarily agree to binding transfers of reward, under pre-specified conditions. Our contributions are theoretical and empirical. First, we show that this augmentation makes all subgame-perfect equilibria of all Fully Observable Markov Games exhibit socially optimal behavior, given a sufficiently rich space of contracts. Next, we show that for general contract spaces, and even under partial observability, richer contract spaces lead to higher welfare. Hence, contract space design solves an exploration-exploitation tradeoff, sidestepping incentive issues. We complement our theoretical analysis with experiments. Issues of exploration in the contracting augmentation are mitigated using a training methodology inspired by multi-objective reinforcement learning: Multi-Objective Contract Augmentation Learning (MOCA). We test our methodology in static, single-move games, as well as dynamic domains that simulate traffic, pollution management and common pool resource management.
Paper Structure (46 sections, 8 theorems, 41 equations, 12 figures, 1 algorithm)

This paper contains 46 sections, 8 theorems, 41 equations, 12 figures, 1 algorithm.

Key Result

Theorem 1

Let $M = \langle S, s_0, \mathbf A, T, \mathbf R, \gamma \rangle$ be a Fully Observable Markov game. If all observations are contractible, $O_0 (s, \mathbf a) = (s, \mathbf a)$, then the contracting space

Figures (12)

  • Figure 1: We evaluate our method in the Cleanup domain hughes2018inequity. Left: A screenshot of the environment. The different agents correspond to the pink, yellow, and purple tiles. Agents are rewarded for picking apples (green), but apples will only grow if the river (blue) is clean of pollution (brown). Agents can clean up pollution, but aren't directly rewarded for cleaning. This creates a social dilemma where no agents clean because they don't expect to benefit from cleaning directly. Right: An illustration of the solution that our contracting augmentation facilitates. In the Cleanup domain, one agent commits to "pay" the other to clean the river. As a result, the agents are able to coordinate on policies that maximize the total reward across both agents.
  • Figure 2: (a) Prisoner's Dilemma (b) Prisoner's Dilemma after signing a contract in which a defector transfers $1.5$ reward to a cooperator. With this contract in force, cooperating becomes a dominant action for both players.
  • Figure 3: The Contracting Augmentation.. Top: Agents can propose contracts, state dependent, zero-sum, additive augmentations to their reward functions. Agents can accept or decline contracts. Left: In case of declination, the interaction between agents happens as before. Right: In case of acceptance of the contract, the reward of the agents is altered according to the rules of the contract.
  • Figure 4: A table summarizing the theoretical results proven in \ref{['sec:theory']} and \ref{['sec:features']}. The columns along the top list theorem types, which are (1) optimality when you have a contracting space with sufficient richness, (2) monotonicity in the size of the contract space (3) monotonicity in the space of contractible features. The rows list the various problems setups considered in these sections.
  • Figure 5: Under detectable deviations, possible welfare levels at equilibrium improve to optimality as the complexity of $\Theta$ grows, if spaces have AUT.
  • ...and 7 more figures

Theorems & Definitions (24)

  • Definition 1
  • Definition 2: Subgame-Perfect Equilibrium
  • Definition 3: Contract
  • Definition 4: Contracting Model
  • Definition 5: $\boldsymbol{\mathcal{C}}$-Augmented Game
  • Theorem 1: Optimality of Contracting
  • proof : Proof Sketch.
  • Theorem 2: Optimality of Contracting with Detectable Deviations
  • Definition 6: WCSPW and BCSPW
  • Proposition 3: $\mathcal{W}, \mathcal{B}$ bound equilibrium welfare
  • ...and 14 more