Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Yingjie Fei; Ruitu Xu

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Yingjie Fei, Ruitu Xu

TL;DR

This work addresses risk-sensitive multi-agent reinforcement learning in general-sum Markov games where agents optimize the entropic risk measure $V_m = \frac{1}{\beta_m}\log \mathbb{E}[e^{\beta_m R_m}]$ and may have heterogeneous risk preferences. It shows that naive regret definitions induce equilibrium bias toward the most risk-sensitive agents, and proposes risk-balanced regret to symmetrize performance across agents, along with a lower-bound analysis. A self-play algorithm, MARS-VI, combines risk-sensitive value iteration with optimistic exploration and an equilibrium solver to learn NE, CE, and CCE, achieving near-optimal guarantees with respect to risk-balanced regret. The results recover classical risk-neutral and single-agent regimes as special cases and provide the first finite-sample guarantees in risk-sensitive MARL, with practical implications for balanced policy design in finance and competitive environments.

Abstract

We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret.

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

TL;DR

This work addresses risk-sensitive multi-agent reinforcement learning in general-sum Markov games where agents optimize the entropic risk measure

and may have heterogeneous risk preferences. It shows that naive regret definitions induce equilibrium bias toward the most risk-sensitive agents, and proposes risk-balanced regret to symmetrize performance across agents, along with a lower-bound analysis. A self-play algorithm, MARS-VI, combines risk-sensitive value iteration with optimistic exploration and an equilibrium solver to learn NE, CE, and CCE, achieving near-optimal guarantees with respect to risk-balanced regret. The results recover classical risk-neutral and single-agent regimes as special cases and provide the first finite-sample guarantees in risk-sensitive MARL, with practical implications for balanced policy design in finance and competitive environments.

Abstract

Paper Structure (24 sections, 10 theorems, 107 equations, 3 algorithms)

This paper contains 24 sections, 10 theorems, 107 equations, 3 algorithms.

Introduction
Related Work
Preliminaries
Notation
Problem Setup
Policy and Value Functions
Equilibrium
Regret and Equilibrium Bias
A Naive Definition of Regret and Its Pitfalls
Theoretical pitfalls.
Practical pitfalls.
Risk-Balanced Regret
Algorithm
Main Results
Comparison with the lower bound.
...and 9 more sections

Key Result

Theorem 4.1

For $H \geq 8$, $K \geq \max\{16e^{|\beta_*|(H-1)}, 16H\}$, and $\log\log K \gtrsim |\beta_*|(H-1)$, there exists an MG such that any algorithm obeys The same bound holds for $\mathbb{E}[ \overline{\mathop{\mathrm{\mathrm{Regret}}}\nolimits}_{\mathsf{CE}}(K) ]$ and $\mathbb{E}[ \overline{\mathop{\mathrm{\mathrm{Regret}}}\nolimits}_{\mathsf{CCE}}(K) ]$.

Theorems & Definitions (19)

Theorem 4.1
Definition 4.2
Definition 4.3
Theorem 4.4
Theorem 6.1
Lemma A.1
proof
Lemma B.1
proof
Lemma B.2
...and 9 more

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

TL;DR

Abstract

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)