Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL
Jiawei Huang, Niao He, Andreas Krause
TL;DR
The paper analyzes sample efficiency for model-based RL in Mean-Field Games by introducing Partial Model-Based Eluder Dimension (P-MBED), a complexity measure that quantifies the single-agent model class derived from a mean-field model class after fixing the population density. It shows that, under realizability and Lipschitz continuity, learning Nash Equilibria in MFGs is statistically no harder than solving a logarithmic number of single-agent RL problems, with a polynomial sample complexity in P-MBED and a polylogarithmic dependence on the model class size. The authors design a ModelElimination algorithm with a Bridge Policy to exploit local model alignment and prove that the resulting sample complexity scales with dim_PE(M, eps') rather than MBED, enabling tractable learning in tabular and linear MT-MFGs. They extend the framework to Multi-Type MFGs via a lifted policy-aware model formulation (MT-PAM) and provide a heuristic, computationally efficient variant with empirical validation, illustrating practical applicability to broader MARL settings. Overall, the work clarifies the statistical tractability of NE learning in MFGs and extends the scope to heterogeneous agent types, offering both theoretical guarantees and scalable algorithms.
Abstract
We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.
