A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Zun Li; Michael P. Wellman

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Zun Li, Michael P. Wellman

TL;DR

This paper addresses the challenge of evaluating deep multiagent reinforcement learning (MARL) in general-sum environments where stochastic training and inter-agent dynamics confound performance. It introduces a meta-game evaluation framework that treats multiagent training algorithms as meta-strategies, builds an empirical meta-game over seed-generated policy outputs, and uses bootstrapped statistics and max-entropy Nash analysis to compare methods. A novel Gumbel IS-MCTS meta-strategy operator is proposed to study run-time search as a general policy-improver, and extensive experiments on Deal-or-No-Deal negotiation tasks show that search-based MATAs often outperform purely learned policies and that self-play and population-based methods exhibit distinct strategic patterns. The framework enables robust, distributional assessments of MARL methods and can guide the design of more effective training and evaluation protocols with practical implications for real-world multiagent systems.

Abstract

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

TL;DR

Abstract

Paper Structure (46 sections, 1 theorem, 9 equations, 6 figures, 8 tables, 4 algorithms)

This paper contains 46 sections, 1 theorem, 9 equations, 6 figures, 8 tables, 4 algorithms.

Introduction
Related Work
Game Theory Preliminaries
Multiagent Training Algorithms
Meta-Game Evaluation Framework
Empirical Game-Theoretic Analysis
Meta-Game Evaluation Procedure
Max-Entropy Nash Equilibrium
Search as a Meta-Strategy Operator
Evaluation Study
Domain: Alternating Negotiation
Benchmark Algorithms
Independent/Multiagent PPO (IDPPO/MAPPO)
Regularized Nash Dynamics (R-NaD) and NFSP
Policy Space Response Oracles (PSRO) and FCP
...and 31 more sections

Key Result

Theorem 1

Given $\epsilon>0$, an $\epsilon$-maximum-entropy symmetric Nash can be solved by a mixed-integer linear program based on prog: mip with an additional $O(\lvert \hat{\mathit{\Pi}}\rvert^2 / \epsilon)$ linear constraints.

Figures (6)

Figure 1: Example start of sequential bargaining game instance.
Figure 2: Empirical best-response graphs for Barg(10, 0, 1) and Barg(30, 0.125, 0.935).
Figure 3: $\textsc{SumRegret}\xspace$ of $\texttt{Barg}(10, 0, 1)$
Figure 4: $\textsc{SumRegret}\xspace$ of $\texttt{Barg}(30, 0.125, 0.935)$
Figure 5: Empirical Distribution of NE-Regret of $\texttt{Barg}(10, 0, 1)$
...and 1 more figures

Theorems & Definitions (2)

Theorem 1
proof

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

TL;DR

Abstract

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)