Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

Laixi Shi; Jingchu Gai; Eric Mazumdar; Yuejie Chi; Adam Wierman

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

Laixi Shi, Jingchu Gai, Eric Mazumdar, Yuejie Chi, Adam Wierman

TL;DR

The work tackles robustness in multi-agent reinforcement learning by formulating distributionally robust Markov games (RMGs) with fictitious uncertainty sets that integrate environment dynamics and others’ behavior, motivated by behavioral economics. It proves the existence of robust equilibria (robust NE and robust CCE) for this class and introduces Robust-Q-FTRL, a sample-efficient algorithm that learns an $\varepsilon$-robust CCE under a generative model. The main theoretical result shows a polynomial, scalable sample complexity of $\tilde{O}\left(\frac{S H^6 \sum_i A_i}{\varepsilon^4} \min\left\{H, \frac{1}{\min_i \sigma_i}\right\}\right)$, breaking the curse of multiagency for RMGs across uncertainty-set definitions. This advances robust MARL by enabling practical, data-efficient learning in settings with realistic uncertainty about both the environment and other agents’ intentions, and it opens avenues for uncertainty-set design, equilibrium refinement, and broader applicability in risk-aware multi-agent systems.

Abstract

Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. RMGs remains under-explored, from reasonable problem formulation to the development of sample-efficient algorithms. Two notorious and open challenges are the formulation of the uncertainty set and whether the corresponding RMGs can overcome the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent's uncertainty set is shaped by both the environment and the integrated behavior of other agents. We first establish the well-posedness of this class of RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs, regardless of the uncertainty set formulation.

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

TL;DR

-robust CCE under a generative model. The main theoretical result shows a polynomial, scalable sample complexity of

, breaking the curse of multiagency for RMGs across uncertainty-set definitions. This advances robust MARL by enabling practical, data-efficient learning in settings with realistic uncertainty about both the environment and other agents’ intentions, and it opens avenues for uncertainty-set design, equilibrium refinement, and broader applicability in risk-aware multi-agent systems.

Abstract

Paper Structure (81 sections, 17 theorems, 166 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 81 sections, 17 theorems, 166 equations, 2 figures, 1 table, 2 algorithms.

Introduction
Open questions of robust MARL
Construction of realistic uncertainty sets.
The curse of multiagency.
Contributions
Notation.
Related works
Breaking curse of multiagency for standard Markov games.
Finite-sample analysis for distributionally robust Markov games.
Robust MARL.
Preliminaries
Standard Markov games
Markov policies and value functions.
Distributionally robust Markov games
Robust value functions and best-response policies.
...and 66 more sections

Key Result

Lemma 1

For any $i\in[n]$, given $\pi_{-i}: {\mathcal{S}} \times [H] \mapsto \Delta(\mathcal{A}_i)$, there exists at least one policy $\widetilde{\pi}_i: {\mathcal{S}} \times [H] \rightarrow \Delta(\mathcal{A}_i)$ for the $i$-th agent that can simultaneously attain $V_{i,h}^{\widetilde{\pi}_i \times \pi_{-i

Figures (2)

Figure 1: $N$-sample estimation$(\pi_h = \{\pi_{j, h}\}_{j\in[n]}, i,h)$.
Figure 2: Robust-Q-FTRL

Theorems & Definitions (31)

Definition 1
Lemma 1
Theorem 1: Existence of robust NE
Theorem 2: Upper bound
Lemma 2
proof
Theorem 3: Theorem 3, li2022minimax
Lemma 3: li2023minimax
Lemma 4
proof
...and 21 more

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

TL;DR

Abstract

Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (31)