Table of Contents
Fetching ...

Learning to Negotiate via Voluntary Commitment

Shuhui Zhu, Baoxiang Wang, Sriram Ganapathi Subramanian, Pascal Poupart

TL;DR

This work tackles commitment failures in mixed-motive multi-agent settings by introducing Markov Commitment Games (MCGs) and a learnable Differentiable Commitment Learning (DCL) framework. DCL learns a tripartite policy per agent—proposal, commitment, and action—through unbiased policy gradients $\nabla V^i_{\bm{\phi,\psi,\pi}}(s)$ while differentiating through other agents\' policies, guided by incentive-compatible constraints to favor mutually beneficial agreements. The approach yields faster convergence and higher social welfare than Independent PPO, Mediated MARL, and MOCA across Prisoner\'s Dilemma, Grid, and repeated/conflicting games, and scales to many players with robustness to irrational agents. These results demonstrate that self-interested agents can negotiate effective agreements without central altruism, with implications for scalable cooperative AI in dynamic environments.

Abstract

The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a variant of commitment games, where agents can voluntarily commit to their proposed future plans. Based on MCGs, we propose a learnable commitment protocol via policy gradients. We further propose incentive-compatible learning to accelerate convergence to equilibria with better social welfare. Experimental results in challenging mixed-motive tasks demonstrate faster empirical convergence and higher returns for our method compared with its counterparts. Our code is available at https://github.com/shuhui-zhu/DCL.

Learning to Negotiate via Voluntary Commitment

TL;DR

This work tackles commitment failures in mixed-motive multi-agent settings by introducing Markov Commitment Games (MCGs) and a learnable Differentiable Commitment Learning (DCL) framework. DCL learns a tripartite policy per agent—proposal, commitment, and action—through unbiased policy gradients while differentiating through other agents\' policies, guided by incentive-compatible constraints to favor mutually beneficial agreements. The approach yields faster convergence and higher social welfare than Independent PPO, Mediated MARL, and MOCA across Prisoner\'s Dilemma, Grid, and repeated/conflicting games, and scales to many players with robustness to irrational agents. These results demonstrate that self-interested agents can negotiate effective agreements without central altruism, with implications for scalable cooperative AI in dynamic environments.

Abstract

The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a variant of commitment games, where agents can voluntarily commit to their proposed future plans. Based on MCGs, we propose a learnable commitment protocol via policy gradients. We further propose incentive-compatible learning to accelerate convergence to equilibria with better social welfare. Experimental results in challenging mixed-motive tasks demonstrate faster empirical convergence and higher returns for our method compared with its counterparts. Our code is available at https://github.com/shuhui-zhu/DCL.

Paper Structure

This paper contains 38 sections, 2 theorems, 35 equations, 6 figures, 8 tables, 3 algorithms.

Key Result

Proposition 4.1

Mutual cooperation is a Pareto-dominant Nash equilibrium in the MCG of the Prisoner's Dilemma.

Figures (6)

  • Figure 1: Markov Commitment Game: A Markov commitment game consists of three stages. In the first stage, agents announce their proposed future actions. In the second stage, agents observe others' proposals and decide whether to commit to the joint plan. In the final stage, agents choose their actions: if all agents commit, they follow their proposals; if any agent does not commit, all agents independently select actions based on the current state. Afterward, agents observe the resulting rewards and transit to the next state.
  • Figure 2: Prisoner's Dilemma: DCL v.s. Other Baselines
  • Figure 3: DCL Policies in Prisoner's Dilemma
  • Figure 4: Grid Game (Horizon=$16$): DCL v.s. Other Baselines.
  • Figure 5: Repeated Purely Conflicting Game (Horizon=$16$): DCL v.s. Other Baselines.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 4.1
  • Lemma 5.1
  • proof
  • proof
  • proof
  • Definition C.1
  • Definition C.2
  • proof