Table of Contents
Fetching ...

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Chenyu Zhang, Xu Chen, Xuan Di

TL;DR

The paper tackles learning mean field games in online settings by unifying policy and population into a single parameter and proposing SemiSGD to update them asynchronously, avoiding unstable forward-backward fixed-point iterations. It extends this framework to population-aware linear function approximation (PA-LFA), enabling online, model-free learning in continuous state-action spaces while preserving computational efficiency. Theoretical contributions include finite-time convergence results for linear MFGs under contractivity and practical neighborhood convergence without contractivity, along with explicit error bounds for non-linear MFGs when using PA-LFA. Empirically, SemiSGD with PA-LFA outperforms FPI-based methods across several MFGs and demonstrates superior stability, efficiency, and accuracy, highlighting its practical impact for large-scale, online learning in multi-agent systems.

Abstract

Mean field games (MFGs) model interactions in large-population multi-agent systems through population distributions. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), where policy updates and induced population distributions are computed separately and sequentially. However, FPI-type methods may suffer from inefficiency and instability due to potential oscillations caused by this forward-backward procedure. In this work, we propose a novel perspective that treats the policy and population as a unified parameter controlling the game dynamics. By applying stochastic parameter approximation to this unified parameter, we develop SemiSGD, a simple stochastic gradient descent (SGD)-type method, where an agent updates its policy and population estimates simultaneously and fully asynchronously. Building on this perspective, we further apply linear function approximation (LFA) to the unified parameter, resulting in the first population-aware LFA (PA-LFA) for learning MFGs on continuous state-action spaces. A comprehensive finite-time convergence analysis is provided for SemiSGD with PA-LFA, including its convergence to the equilibrium for linear MFGs -- a class of MFGs with a linear structure concerning the population -- under the standard contractivity condition, and to a neighborhood of the equilibrium under a more practical condition. We also characterize the approximation error for non-linear MFGs. We validate our theoretical findings with six experiments on three MFGs.

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

TL;DR

The paper tackles learning mean field games in online settings by unifying policy and population into a single parameter and proposing SemiSGD to update them asynchronously, avoiding unstable forward-backward fixed-point iterations. It extends this framework to population-aware linear function approximation (PA-LFA), enabling online, model-free learning in continuous state-action spaces while preserving computational efficiency. Theoretical contributions include finite-time convergence results for linear MFGs under contractivity and practical neighborhood convergence without contractivity, along with explicit error bounds for non-linear MFGs when using PA-LFA. Empirically, SemiSGD with PA-LFA outperforms FPI-based methods across several MFGs and demonstrates superior stability, efficiency, and accuracy, highlighting its practical impact for large-scale, online learning in multi-agent systems.

Abstract

Mean field games (MFGs) model interactions in large-population multi-agent systems through population distributions. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), where policy updates and induced population distributions are computed separately and sequentially. However, FPI-type methods may suffer from inefficiency and instability due to potential oscillations caused by this forward-backward procedure. In this work, we propose a novel perspective that treats the policy and population as a unified parameter controlling the game dynamics. By applying stochastic parameter approximation to this unified parameter, we develop SemiSGD, a simple stochastic gradient descent (SGD)-type method, where an agent updates its policy and population estimates simultaneously and fully asynchronously. Building on this perspective, we further apply linear function approximation (LFA) to the unified parameter, resulting in the first population-aware LFA (PA-LFA) for learning MFGs on continuous state-action spaces. A comprehensive finite-time convergence analysis is provided for SemiSGD with PA-LFA, including its convergence to the equilibrium for linear MFGs -- a class of MFGs with a linear structure concerning the population -- under the standard contractivity condition, and to a neighborhood of the equilibrium under a more practical condition. We also characterize the approximation error for non-linear MFGs. We validate our theoretical findings with six experiments on three MFGs.
Paper Structure (53 sections, 24 theorems, 204 equations, 14 figures, 5 tables, 2 algorithms)

This paper contains 53 sections, 24 theorems, 204 equations, 14 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

Suppose $(Q_{*};M_{*})$ is an MFE. Suppose the reward function, transition kernel, and policy operator are Lipschitz continuous with Lipschitz constant $L$. For any value function $Q$ and population measure $M$, with $\Delta Q\coloneqq Q-Q_{*}$ and $\Delta M\coloneqq M-M_{*}$, we have Due to the coupling between $Q$ and $M$, neither $-\bar{\mathfrak{g}}_{{(Q,M)}}(M)$ nor $-\bar{\mathfrak{g}}_{{(Q

Figures (14)

  • Figure 1: Learning dynamics.
  • Figure 2: Convergence performance of SemiSGD and FPI.
  • Figure 3: On # inner iterations.
  • Figure 4: LFA & Discretization.
  • Figure 5: Exploitablity of model-based FPI with FP.
  • ...and 9 more figures

Theorems & Definitions (50)

  • Definition 1: Mean field equilibrium
  • Lemma 1: Descent direction; informal
  • Definition 2: Linear mean field games
  • Proposition 1
  • Example 1: Linear MDP plus population-independent transition kernel
  • Example 2: Finite state-action space
  • Remark 1: Operation complexity
  • Proposition 2: MFE as a stationary point
  • Remark 2
  • Theorem 1: One-step progress
  • ...and 40 more