Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

Jiawei Huang; Niao He; Andreas Krause

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

Jiawei Huang, Niao He, Andreas Krause

TL;DR

The paper analyzes sample efficiency for model-based RL in Mean-Field Games by introducing Partial Model-Based Eluder Dimension (P-MBED), a complexity measure that quantifies the single-agent model class derived from a mean-field model class after fixing the population density. It shows that, under realizability and Lipschitz continuity, learning Nash Equilibria in MFGs is statistically no harder than solving a logarithmic number of single-agent RL problems, with a polynomial sample complexity in P-MBED and a polylogarithmic dependence on the model class size. The authors design a ModelElimination algorithm with a Bridge Policy to exploit local model alignment and prove that the resulting sample complexity scales with dim_PE(M, eps') rather than MBED, enabling tractable learning in tabular and linear MT-MFGs. They extend the framework to Multi-Type MFGs via a lifted policy-aware model formulation (MT-PAM) and provide a heuristic, computationally efficient variant with empirical validation, illustrating practical applicability to broader MARL settings. Overall, the work clarifies the statistical tractability of NE learning in MFGs and extends the scope to heterogeneous agent types, offering both theoretical guarantees and scalable algorithms.

Abstract

We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

TL;DR

Abstract

Paper Structure (85 sections, 52 theorems, 193 equations, 2 figures, 1 table, 4 algorithms)

This paper contains 85 sections, 52 theorems, 193 equations, 2 figures, 1 table, 4 algorithms.

Introduction
Related Work
Background
Mean-Field Games
Multi-Type Mean-Field Games
Partial Model-Based Eluder Dimension
Sample Efficiency of Learning in MFGs
Main Algorithm and Highlight of Main Results
Algorithm Design and Proof Sketch
ModelElim: The Model Elimination Step
Fast Elimination with Bridge Policy
Proof Sketch of Thm. \ref{['lem:exists_bridge_policy']}
Learning in Multi-Type MFGs
A Heuristic Algorithm with Improved Computational Efficiency
Conclusion
...and 70 more sections

Key Result

Proposition 3.3

(Tabular Setting) For any $\mathcal{M}$ and $\varepsilon > 0$, $\dim_{\rm PE}(\mathcal{M},\varepsilon) \leq |\mathcal{S}||\mathcal{A}|$, while there exists a concrete example of $\mathcal{M}$ such that $\dim_{\rm E}(\mathcal{M},\varepsilon) = \Omega(\exp(|\mathcal{S}|))$.

Figures (2)

Figure 1: Construction of Lower Bound
Figure 2: Experiment results in linear style MFG. We report the number of remaining models and the normalized maximal NE Gap by the NE policies of remaining models during the model elimination process. Error bars correspond to 95% confidence intervals.

Theorems & Definitions (109)

Definition 2.1
Definition 3.1: $\varepsilon$-Independence; huang2023statistical
Definition 3.2: Partial $\varepsilon$-Independence
Definition 3.3: Partial MBED
Proposition 3.3
Proposition 3.4: Linear MFGs; Informal version of Prop. \ref{['prop:MBED_Linear_MFMDP_formal']}
Theorem 4.1
Theorem 4.2
Lemma 4.2
Theorem 4.3
...and 99 more

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

TL;DR

Abstract

Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (109)