Table of Contents
Fetching ...

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh

TL;DR

This paper addresses discrete-time stationary mean-field games with unknown dynamics by introducing the fully herding MFG class, which can have multiple equilibria and is not restricted by contraction or strict monotonicity. It proposes ASAC-MFG, a direct policy-optimization algorithm that operates in a single loop with a single trajectory of Markovian samples and uses a three-time-scale stochastic approximation to guarantee finite-time convergence to a mean-field equilibrium, achieving a rate of $\widetilde{O}(k^{-1/4})$ for fully herding MFGs and a $\widetilde{O}(\sqrt{\kappa})$-approximation for general herding MFGs. The analysis introduces new Lyapunov-based techniques and multi-time-scale convergence results, and shows that a degenerate MFG reduces to an online actor-critic method for average-reward MDPs with rates matching the state-of-the-art without contraction assumptions. Numerical experiments on synthetic MFGs and a beach-bar problem demonstrate fast convergence, stability, and reduced variance of ASAC-MFG relative to prior approaches, even when the problem lies outside the fully herding subclass. Overall, the work broadens the set of solvable MFGs, provides a practical, implementable algorithm, and delivers finite-time guarantees that advance the understanding of learning in mean-field games with unknown dynamics.

Abstract

We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

TL;DR

This paper addresses discrete-time stationary mean-field games with unknown dynamics by introducing the fully herding MFG class, which can have multiple equilibria and is not restricted by contraction or strict monotonicity. It proposes ASAC-MFG, a direct policy-optimization algorithm that operates in a single loop with a single trajectory of Markovian samples and uses a three-time-scale stochastic approximation to guarantee finite-time convergence to a mean-field equilibrium, achieving a rate of for fully herding MFGs and a -approximation for general herding MFGs. The analysis introduces new Lyapunov-based techniques and multi-time-scale convergence results, and shows that a degenerate MFG reduces to an online actor-critic method for average-reward MDPs with rates matching the state-of-the-art without contraction assumptions. Numerical experiments on synthetic MFGs and a beach-bar problem demonstrate fast convergence, stability, and reduced variance of ASAC-MFG relative to prior approaches, even when the problem lies outside the fully herding subclass. Overall, the work broadens the set of solvable MFGs, provides a practical, implementable algorithm, and delivers finite-time guarantees that advance the understanding of learning in mean-field games with unknown dynamics.

Abstract

We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.
Paper Structure (45 sections, 20 theorems, 185 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 45 sections, 20 theorems, 185 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Consider the iterates generated by Algorithm alg:main on a $\kappa$-herding MFG, with the step sizes satisfying where the constants $\lambda_0,\alpha_0,\beta_0,\xi_0$ are specified later in Appendix sec:proof_thm:proof. Under Assumptions assump:ergodic-assump:nu, we have for all $k\geq\tau_k$ where $\tau_k$ denotes the mixing time, which is an affine function of $\log(k+1)$ defined in Appendix se

Figures (5)

  • Figure 1: $\mathcal{M}$ denotes the class of all MFGs. $\mathcal{M}_{\text{cont.}}$ and $\mathcal{M}_{\text{mono.}}$ are the MFG classes satisfying contraction and strict weak monotonicity, and they are subsets of $\mathcal{M}_{\text{unique}}$ which is the class of MFGs having a unique equilibrium. The proposed algorithm, ASAC-MFG, solves MFGs in $\mathcal{M}_{\text{fully herding}}$ [cf. Def. \ref{['def:h-mfg']}].
  • Figure 2: Illustration of Example \ref{['example:twostate']}
  • Figure 3: Algorithm performance in synthetic mean field games. First row shows sub-optimality gap of policy under latest mean field estimate. Second row shows convergence of mean field estimate to mean field induced by latest policy iterate. First column: Environment 1. Second column: Environment 2. Third column: Environment 3.
  • Figure 4: Algorithm performance in the beach bar problem.
  • Figure 5: Example Mean Field Game Transition

Theorems & Definitions (26)

  • Definition 1: $\epsilon-$MFE
  • Definition 2: Herding MFG
  • Example 1
  • Example 2
  • Remark 1
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Definition 3
  • Lemma 1
  • ...and 16 more