Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation
Chenyu Zhang, Xu Chen, Xuan Di
TL;DR
The paper tackles learning mean field games in online settings by unifying policy and population into a single parameter and proposing SemiSGD to update them asynchronously, avoiding unstable forward-backward fixed-point iterations. It extends this framework to population-aware linear function approximation (PA-LFA), enabling online, model-free learning in continuous state-action spaces while preserving computational efficiency. Theoretical contributions include finite-time convergence results for linear MFGs under contractivity and practical neighborhood convergence without contractivity, along with explicit error bounds for non-linear MFGs when using PA-LFA. Empirically, SemiSGD with PA-LFA outperforms FPI-based methods across several MFGs and demonstrates superior stability, efficiency, and accuracy, highlighting its practical impact for large-scale, online learning in multi-agent systems.
Abstract
Mean field games (MFGs) model interactions in large-population multi-agent systems through population distributions. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), where policy updates and induced population distributions are computed separately and sequentially. However, FPI-type methods may suffer from inefficiency and instability due to potential oscillations caused by this forward-backward procedure. In this work, we propose a novel perspective that treats the policy and population as a unified parameter controlling the game dynamics. By applying stochastic parameter approximation to this unified parameter, we develop SemiSGD, a simple stochastic gradient descent (SGD)-type method, where an agent updates its policy and population estimates simultaneously and fully asynchronously. Building on this perspective, we further apply linear function approximation (LFA) to the unified parameter, resulting in the first population-aware LFA (PA-LFA) for learning MFGs on continuous state-action spaces. A comprehensive finite-time convergence analysis is provided for SemiSGD with PA-LFA, including its convergence to the equilibrium for linear MFGs -- a class of MFGs with a linear structure concerning the population -- under the standard contractivity condition, and to a neighborhood of the equilibrium under a more practical condition. We also characterize the approximation error for non-linear MFGs. We validate our theoretical findings with six experiments on three MFGs.
