Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Chenyu Zhang; Xu Chen; Xuan Di

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Chenyu Zhang, Xu Chen, Xuan Di

TL;DR

The paper tackles learning mean field games in online settings by unifying policy and population into a single parameter and proposing SemiSGD to update them asynchronously, avoiding unstable forward-backward fixed-point iterations. It extends this framework to population-aware linear function approximation (PA-LFA), enabling online, model-free learning in continuous state-action spaces while preserving computational efficiency. Theoretical contributions include finite-time convergence results for linear MFGs under contractivity and practical neighborhood convergence without contractivity, along with explicit error bounds for non-linear MFGs when using PA-LFA. Empirically, SemiSGD with PA-LFA outperforms FPI-based methods across several MFGs and demonstrates superior stability, efficiency, and accuracy, highlighting its practical impact for large-scale, online learning in multi-agent systems.

Abstract

Mean field games (MFGs) model interactions in large-population multi-agent systems through population distributions. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), where policy updates and induced population distributions are computed separately and sequentially. However, FPI-type methods may suffer from inefficiency and instability due to potential oscillations caused by this forward-backward procedure. In this work, we propose a novel perspective that treats the policy and population as a unified parameter controlling the game dynamics. By applying stochastic parameter approximation to this unified parameter, we develop SemiSGD, a simple stochastic gradient descent (SGD)-type method, where an agent updates its policy and population estimates simultaneously and fully asynchronously. Building on this perspective, we further apply linear function approximation (LFA) to the unified parameter, resulting in the first population-aware LFA (PA-LFA) for learning MFGs on continuous state-action spaces. A comprehensive finite-time convergence analysis is provided for SemiSGD with PA-LFA, including its convergence to the equilibrium for linear MFGs -- a class of MFGs with a linear structure concerning the population -- under the standard contractivity condition, and to a neighborhood of the equilibrium under a more practical condition. We also characterize the approximation error for non-linear MFGs. We validate our theoretical findings with six experiments on three MFGs.

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

TL;DR

Abstract

Paper Structure (53 sections, 24 theorems, 204 equations, 14 figures, 5 tables, 2 algorithms)

This paper contains 53 sections, 24 theorems, 204 equations, 14 figures, 5 tables, 2 algorithms.

Introduction
Stochastic semi-gradient descent for MFGs on finite state-action spaces
Revisit online learning for MFGs on finite state-action spaces
Stochastic semi-gradient descent
Linear mean field games
SemiSGD with population-aware linear function approximation
Sample complexity analysis
Approximation error for non-linear MFGs
Numerical experiments
Conclusion
Appendix
Extended literature review
More discussions on motivations
Definitions of MFE and policy operators
Standard definition of MFE using Bellman optimality equation.
...and 38 more sections

Key Result

Lemma 1

Suppose $(Q_{*};M_{*})$ is an MFE. Suppose the reward function, transition kernel, and policy operator are Lipschitz continuous with Lipschitz constant $L$. For any value function $Q$ and population measure $M$, with $\Delta Q\coloneqq Q-Q_{*}$ and $\Delta M\coloneqq M-M_{*}$, we have Due to the coupling between $Q$ and $M$, neither $-\bar{\mathfrak{g}}_{{(Q,M)}}(M)$ nor $-\bar{\mathfrak{g}}_{{(Q

Figures (14)

Figure 1: Learning dynamics.
Figure 2: Convergence performance of SemiSGD and FPI.
Figure 3: On # inner iterations.
Figure 4: LFA & Discretization.
Figure 5: Exploitablity of model-based FPI with FP.
...and 9 more figures

Theorems & Definitions (50)

Definition 1: Mean field equilibrium
Lemma 1: Descent direction; informal
Definition 2: Linear mean field games
Proposition 1
Example 1: Linear MDP plus population-independent transition kernel
Example 2: Finite state-action space
Remark 1: Operation complexity
Proposition 2: MFE as a stationary point
Remark 2
Theorem 1: One-step progress
...and 40 more

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

TL;DR

Abstract

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (50)