A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning
Zun Li, Michael P. Wellman
TL;DR
This paper addresses the challenge of evaluating deep multiagent reinforcement learning (MARL) in general-sum environments where stochastic training and inter-agent dynamics confound performance. It introduces a meta-game evaluation framework that treats multiagent training algorithms as meta-strategies, builds an empirical meta-game over seed-generated policy outputs, and uses bootstrapped statistics and max-entropy Nash analysis to compare methods. A novel Gumbel IS-MCTS meta-strategy operator is proposed to study run-time search as a general policy-improver, and extensive experiments on Deal-or-No-Deal negotiation tasks show that search-based MATAs often outperform purely learned policies and that self-play and population-based methods exhibit distinct strategic patterns. The framework enables robust, distributional assessments of MARL methods and can guide the design of more effective training and evaluation protocols with practical implications for real-world multiagent systems.
Abstract
Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.
