Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Yan Dai; Qiwen Cui; Simon S. Du

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Yan Dai, Qiwen Cui, Simon S. Du

TL;DR

This work addresses the sample-efficiency challenge in multi-agent Markov Games with independent linear function approximation by refining the AVLPR framework with data-dependent pessimistic gap estimators. It introduces action-dependent bonuses and Magnitude-Reduced Estimators, combined with Adaptive Freedman inequalities and covariance-matrix concentration, to achieve the optimal $O(T^{-1/2})$ convergence while avoiding polynomial dependence on the action size $A_{ ext{max}}$. The resulting algorithm attains a high-probability $ ext{CCE}$ guarantee with a final sample complexity of $ ilde{O}(m^4 d^5 H^6 ilde{oldsymbol{ ext{epsilon}}}^{-2})$, marking the first approach to circumvent the curse of multi-agents under independent linear function approximation. The framework blends ideas from adversarial contextual bandits, linear MDP theory, and stochastic matrix concentration to deliver practical, scalable MARL guarantees. This has potential impact for large-scale, multi-agent decision-making systems where joint state-action spaces are intractable and function approximation is essential.

Abstract

Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the "curse of multi-agents" (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ -- which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing *data-dependent* (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel *action-dependent bonuses* to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

TL;DR

convergence while avoiding polynomial dependence on the action size

. The resulting algorithm attains a high-probability

guarantee with a final sample complexity of

, marking the first approach to circumvent the curse of multi-agents under independent linear function approximation. The framework blends ideas from adversarial contextual bandits, linear MDP theory, and stochastic matrix concentration to deliver practical, scalable MARL guarantees. This has potential impact for large-scale, multi-agent decision-making systems where joint state-action spaces are intractable and function approximation is essential.

Abstract

or brought a polynomial dependency on the number of actions

-- which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing *data-dependent* (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel *action-dependent bonuses* to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal

convergence rate, and avoids

dependency simultaneously.

Paper Structure (49 sections, 37 theorems, 153 equations, 3 algorithms)

This paper contains 49 sections, 37 theorems, 153 equations, 3 algorithms.

Introduction
Key Insights and Technical Overview of This Paper
Related Work
Tabular Markov Games.
Markov Games with Function Approximation.
Markov Decision Processes with Linear Function Approximation.
Concurrent Work by fan2024rl.
Preliminaries
Markov Games.
Policies and Value Functions.
Coarse Correlated Equilibrium.
MGs with Independent Linear Function Approximation.
Improved AVLPR Framework
Overview of the AVLPR Framework by wang2023breaking.
Loosened High-Probability Bound Requirement.
...and 34 more sections

Key Result

Theorem 3.1

Suppose that Then, by setting $T=\operatorname{\widetilde{\mathcal{O}}}(H^2L\epsilon^{-2})$, an $\epsilon$-CCE can be yielded within $\operatorname{\widetilde{\mathcal{O}}}(H^3 L d_{\text{replay}} \epsilon^{-2})$ samples.

Theorems & Definitions (65)

Theorem 3.1: Main Theorem of AVLPR wang2023breaking; Informal
Theorem 3.2: Main Theorem of Improved AVLPR; Informal
proof : Proof Sketch of \ref{['lem:new main theorem']}
Theorem 4.1: Gap is w.h.p. Pessimistic; Informal
Theorem 4.2: \ref{['alg:linear case']} Allows a Potential Function; Informal
Theorem 4.3: Main Theorem of the Overall Algorithm
Theorem A.1: Main Theorem of Improved AVLPR; Restatement of \ref{['lem:new main theorem']}
proof
Lemma A.2
proof
...and 55 more

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

TL;DR

Abstract

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (65)