Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Robert Loftin; Mustafa Mert Çelikok; Herke van Hoof; Samuel Kaski; Frans A. Oliehoek

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, Frans A. Oliehoek

TL;DR

This work tackles learning in two-player differentiable games under ad hoc cooperation by introducing Hierarchical learning with Commitments (Hi-C), an uncoupled, gradient-free method that relies on perturbation-based updates and follower-response observations. By embedding a commitment mechanism that lets followers adapt to perturbed leader strategies, Hi-C achieves convergence to differential Stackelberg equilibria under conditions mirroring those of coupled methods, without requiring access to the follower’s payoff or learning updates. The paper provides theoretical convergence guarantees, explicit commitment schedules for strongly concave followers, and a practical online role-negotiation mechanism enabling symmetric learners to decide leader/follower roles on the fly. These contributions advance decentralized, cooperative multi-agent learning and have direct implications for ad hoc teamwork in reinforcement learning and related domains.

Abstract

In multi-agent problems requiring a high degree of cooperation, success often depends on the ability of the agents to adapt to each other's behavior. A natural solution concept in such settings is the Stackelberg equilibrium, in which the ``leader'' agent selects the strategy that maximizes its own payoff given that the ``follower'' agent will choose their best response to this strategy. Recent work has extended this solution concept to two-player differentiable games, such as those arising from multi-agent deep reinforcement learning, in the form of the \textit{differential} Stackelberg equilibrium. While this previous work has presented learning dynamics which converge to such equilibria, these dynamics are ``coupled'' in the sense that the learning updates for the leader's strategy require some information about the follower's payoff function. As such, these methods cannot be applied to truly decentralised multi-agent settings, particularly ad hoc cooperation, where each agent only has access to its own payoff function. In this work we present ``uncoupled'' learning dynamics based on zeroth-order gradient estimators, in which each agent's strategy update depends only on their observations of the other's behavior. We analyze the convergence of these dynamics in general-sum games, and prove that they converge to differential Stackelberg equilibria under the same conditions as previous coupled methods. Furthermore, we present an online mechanism by which symmetric learners can negotiate leader-follower roles. We conclude with a discussion of the implications of our work for multi-agent reinforcement learning and ad hoc collaboration more generally.

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

TL;DR

Abstract

Paper Structure (20 sections, 5 theorems, 22 equations, 1 figure, 2 algorithms)

This paper contains 20 sections, 5 theorems, 22 equations, 1 figure, 2 algorithms.

Introduction
Background
Simultaneous Gradient Ascent and Differential Nash Equilibria
Hierarchical Model and Differential Stackelberg Equilibria
Hierarchical Gradient Update
Limitations of Coupled Learning
Uncoupled Learning with Commitments
Estimating $r(\tilde{x}_n)$.
Convergence Analysis
Choosing the Commitment Schedule
Numerical Experiments
Role Negotiation
Discussion
Future work.
Related Work
...and 5 more sections

Key Result

Proposition 2.4

Differential Stackelberg equilibria and differential Nash equilibria are equivalent in fully-cooperative games where $f_1 = f_2$.

Figures (1)

Figure 1: Hi-C paired with gradient ascent in the Cournot duopoly. Averaged over 32 runs (shaded regions show ranges).

Theorems & Definitions (8)

Definition 2.1: Differential Nash Equilibrium differential_nash
Definition 2.2: Stackelberg Equilibrium (SE) simaan1973stackelberg
Definition 2.3: Differential Stackelberg Equilibrium fiez2020implicit
Proposition 2.4: Fully-cooperative Multi-agent RL and DSE
Theorem 3.6
Corollary 3.7
Proposition 3.9: nesterov2018convex
Corollary 3.11

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

TL;DR

Abstract

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (8)