Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

Batuhan Yardim; Niao He

Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

Batuhan Yardim, Niao He

TL;DR

The paper tackles learning in large-scale MARL under approximate symmetry when a known MFG model is unavailable. It introduces an induced mean-field game constructed from any finite N-player dynamic game via Kirszbraun Lipschitz extensions and defines α,β-symmetric DGs to quantify heterogeneity. It proves that the induced MFG Nash equilibrium provides an approximate equilibrium for the original game with explicit bounds, and establishes TD-learning guarantees along with a monotone PMD-based method achieving an ε-NE with sample complexity $ ilde{O}(oldsymbol{}^{-6})$ trajectories. Empirical validation on benchmarks with thousands of agents demonstrates scalability and efficiency gains from symmetrized neural policies and end-to-end learning without explicit MFG models.

Abstract

Mean-field games (MFG) have become significant tools for solving large-scale multi-agent reinforcement learning problems under symmetry. However, the assumption of exact symmetry limits the applicability of MFGs, as real-world scenarios often feature inherent heterogeneity. Furthermore, most works on MFG assume access to a known MFG model, which might not be readily available for real-world finite-agent games. In this work, we broaden the applicability of MFGs by providing a methodology to extend any finite-player, possibly asymmetric, game to an "induced MFG". First, we prove that $N$-player dynamic games can be symmetrized and smoothly extended to the infinite-player continuum via explicit Kirszbraun extensions. Next, we propose the notion of $α,β$-symmetric games, a new class of dynamic population games that incorporate approximate permutation invariance. For $α,β$-symmetric games, we establish explicit approximation bounds, demonstrating that a Nash policy of the induced MFG is an approximate Nash of the $N$-player dynamic game. We show that TD learning converges up to a small bias using trajectories of the $N$-player game with finite-sample guarantees, permitting symmetrized learning without building an explicit MFG model. Finally, for certain games satisfying monotonicity, we prove a sample complexity of $\widetilde{\mathcal{O}}(\varepsilon^{-6})$ for the $N$-agent game to learn an $\varepsilon$-Nash up to symmetrization bias. Our theory is supported by evaluations on MARL benchmarks with thousands of agents.

Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

TL;DR

trajectories. Empirical validation on benchmarks with thousands of agents demonstrates scalability and efficiency gains from symmetrized neural policies and end-to-end learning without explicit MFG models.

Abstract

-player dynamic games can be symmetrized and smoothly extended to the infinite-player continuum via explicit Kirszbraun extensions. Next, we propose the notion of

-symmetric games, a new class of dynamic population games that incorporate approximate permutation invariance. For

-symmetric games, we establish explicit approximation bounds, demonstrating that a Nash policy of the induced MFG is an approximate Nash of the

-player dynamic game. We show that TD learning converges up to a small bias using trajectories of the

-player game with finite-sample guarantees, permitting symmetrized learning without building an explicit MFG model. Finally, for certain games satisfying monotonicity, we prove a sample complexity of

for the

-agent game to learn an

-Nash up to symmetrization bias. Our theory is supported by evaluations on MARL benchmarks with thousands of agents.

Paper Structure (27 sections, 18 theorems, 162 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 27 sections, 18 theorems, 162 equations, 2 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Our Contributions
Main Results
Finite-Horizon Dynamic Games
Symmetrization and Lipschitz Extension
Mean-field Games and a,b-Symmetric Games
Approximation of NE under Approximate Symmetry
Policy Evaluation with a,b-Symmetry
Learning NE under a,b-Symmetry
Experimental Results
Discussion and Conclusion
Preliminaries
Extended Proofs on Approximation
Proof of Lemma \ref{['lemma:extension_bound']}
...and 12 more sections

Key Result

Lemma 1

Let $d_1, d_2 \in \mathbb{N}_{> 0}$, and $U\subset\mathbb{R}^{d_1}$. Let $f: U \rightarrow \mathbb{R}^{d_2}$ be an $L$-Lipschitz function with respect to the Euclidean norm $\|\cdot\|_2$. Then, there exists $\operatorname{Ext}\left(f\right): \mathbb{R}^{d_1} \rightarrow \mathbb{R}^{d_2}$ such that $

Figures (2)

Figure 1: (a) The mean rewards throughout training of symmetric policies (Sym-NN), policies with onehot encoding for $i$ (OH-NN), policies with numerical encoding for $i$ (Ind-NN) and independent policies (Ind-NN) in A-Taxi. (b, c) The exploitability throughout multiple epochs of Symm-PMD (Algorithm \ref{['alg:pmd']}) and IPMD, for A-RPS with $\beta=0.1$ in (b) and A-SIS with $\alpha=\beta=0.1$ in (c).
Figure 2: (a) The sensitivity of the MFG-NE to heterogeneity parameters $\alpha,\beta$ in the A-SIS environment, in terms of exploitability. (b) Percentage of vehicles in Zone 1 in the A-Taxi environment throughout training epochs for 4 benchmark algorithms.

Theorems & Definitions (44)

Definition 1: N-player FH-DG
Definition 2: FH-DG Nash equilibrium
Definition 3: Symmetric function, symmetrization
Lemma 1: Kirszbraun-Valentine kirszbraun1934zusammenziehendevalentine1945lipschitz
Definition 4: Finite-horizon mean-field game
Definition 5: Induced population, MFG-NE
Definition 6: Induced FH-MFG
Remark 1
Definition 7: $\alpha,\beta$-Symmetric DG
Definition 8: $\kappa$-sparse dynamics/rewards
...and 34 more

Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

TL;DR

Abstract

Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (44)