Table of Contents
Fetching ...

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Zun Li, Michael P. Wellman

TL;DR

This paper addresses the challenge of evaluating deep multiagent reinforcement learning (MARL) in general-sum environments where stochastic training and inter-agent dynamics confound performance. It introduces a meta-game evaluation framework that treats multiagent training algorithms as meta-strategies, builds an empirical meta-game over seed-generated policy outputs, and uses bootstrapped statistics and max-entropy Nash analysis to compare methods. A novel Gumbel IS-MCTS meta-strategy operator is proposed to study run-time search as a general policy-improver, and extensive experiments on Deal-or-No-Deal negotiation tasks show that search-based MATAs often outperform purely learned policies and that self-play and population-based methods exhibit distinct strategic patterns. The framework enables robust, distributional assessments of MARL methods and can guide the design of more effective training and evaluation protocols with practical implications for real-world multiagent systems.

Abstract

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

TL;DR

This paper addresses the challenge of evaluating deep multiagent reinforcement learning (MARL) in general-sum environments where stochastic training and inter-agent dynamics confound performance. It introduces a meta-game evaluation framework that treats multiagent training algorithms as meta-strategies, builds an empirical meta-game over seed-generated policy outputs, and uses bootstrapped statistics and max-entropy Nash analysis to compare methods. A novel Gumbel IS-MCTS meta-strategy operator is proposed to study run-time search as a general policy-improver, and extensive experiments on Deal-or-No-Deal negotiation tasks show that search-based MATAs often outperform purely learned policies and that self-play and population-based methods exhibit distinct strategic patterns. The framework enables robust, distributional assessments of MARL methods and can guide the design of more effective training and evaluation protocols with practical implications for real-world multiagent systems.

Abstract

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.
Paper Structure (46 sections, 1 theorem, 9 equations, 6 figures, 8 tables, 4 algorithms)

This paper contains 46 sections, 1 theorem, 9 equations, 6 figures, 8 tables, 4 algorithms.

Key Result

Theorem 1

Given $\epsilon>0$, an $\epsilon$-maximum-entropy symmetric Nash can be solved by a mixed-integer linear program based on prog: mip with an additional $O(\lvert \hat{\mathit{\Pi}}\rvert^2 / \epsilon)$ linear constraints.

Figures (6)

  • Figure 1: Example start of sequential bargaining game instance.
  • Figure 2: Empirical best-response graphs for Barg(10, 0, 1) and Barg(30, 0.125, 0.935).
  • Figure 3: $\textsc{SumRegret}\xspace$ of $\texttt{Barg}(10, 0, 1)$
  • Figure 4: $\textsc{SumRegret}\xspace$ of $\texttt{Barg}(30, 0.125, 0.935)$
  • Figure 5: Empirical Distribution of NE-Regret of $\texttt{Barg}(10, 0, 1)$
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof