Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis
Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh
TL;DR
This paper addresses discrete-time stationary mean-field games with unknown dynamics by introducing the fully herding MFG class, which can have multiple equilibria and is not restricted by contraction or strict monotonicity. It proposes ASAC-MFG, a direct policy-optimization algorithm that operates in a single loop with a single trajectory of Markovian samples and uses a three-time-scale stochastic approximation to guarantee finite-time convergence to a mean-field equilibrium, achieving a rate of $\widetilde{O}(k^{-1/4})$ for fully herding MFGs and a $\widetilde{O}(\sqrt{\kappa})$-approximation for general herding MFGs. The analysis introduces new Lyapunov-based techniques and multi-time-scale convergence results, and shows that a degenerate MFG reduces to an online actor-critic method for average-reward MDPs with rates matching the state-of-the-art without contraction assumptions. Numerical experiments on synthetic MFGs and a beach-bar problem demonstrate fast convergence, stability, and reduced variance of ASAC-MFG relative to prior approaches, even when the problem lies outside the fully herding subclass. Overall, the work broadens the set of solvable MFGs, provides a practical, implementable algorithm, and delivers finite-time guarantees that advance the understanding of learning in mean-field games with unknown dynamics.
Abstract
We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.
