Table of Contents
Fetching ...

Massively Multiagent Minigames for Training Generalist Agents

Kyoung Whan Choe, Ryan Sullivan, Joseph Suárez

TL;DR

Meta MMO extends Neural MMO by providing a configurable suite of minigames with domain randomization and adaptive difficulty to study generalization in environments with many agents. It enables training a single generalist policy across multiple minigames using PPO/IPPO in a decentralized setting, and demonstrates that the generalist can match or exceed specialist performance with the same target-task data while offering up to a threefold increase in training speed. Additional contributions include new minigames, team-oriented training wrappers, and an open-source release of environment, baselines, and training code under MIT license, supporting research on curriculum learning, coordination, and cross-task transfer in large-scale MARL. Together, these advances provide a practical, scalable benchmark for probing generalization, coordination, and curriculum design in many-agent RL.

Abstract

We present Meta MMO, a collection of many-agent minigames for use as a reinforcement learning benchmark. Meta MMO is built on top of Neural MMO, a massively multiagent environment that has been the subject of two previous NeurIPS competitions. Our work expands Neural MMO with several computationally efficient minigames. We explore generalization across Meta MMO by learning to play several minigames with a single set of weights. We release the environment, baselines, and training code under the MIT license. We hope that Meta MMO will spur additional progress on Neural MMO and, more generally, will serve as a useful benchmark for many-agent generalization.

Massively Multiagent Minigames for Training Generalist Agents

TL;DR

Meta MMO extends Neural MMO by providing a configurable suite of minigames with domain randomization and adaptive difficulty to study generalization in environments with many agents. It enables training a single generalist policy across multiple minigames using PPO/IPPO in a decentralized setting, and demonstrates that the generalist can match or exceed specialist performance with the same target-task data while offering up to a threefold increase in training speed. Additional contributions include new minigames, team-oriented training wrappers, and an open-source release of environment, baselines, and training code under MIT license, supporting research on curriculum learning, coordination, and cross-task transfer in large-scale MARL. Together, these advances provide a practical, scalable benchmark for probing generalization, coordination, and curriculum design in many-agent RL.

Abstract

We present Meta MMO, a collection of many-agent minigames for use as a reinforcement learning benchmark. Meta MMO is built on top of Neural MMO, a massively multiagent environment that has been the subject of two previous NeurIPS competitions. Our work expands Neural MMO with several computationally efficient minigames. We explore generalization across Meta MMO by learning to play several minigames with a single set of weights. We release the environment, baselines, and training code under the MIT license. We hope that Meta MMO will spur additional progress on Neural MMO and, more generally, will serve as a useful benchmark for many-agent generalization.
Paper Structure (25 sections, 11 figures, 3 tables)

This paper contains 25 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Meta MMO's minigame framework enables fine-grained control over game objectives, agent spawning, team assignments, and various game elements. Subsystems manage resource generation, combat rules, NPC behavior, item supply, and market dynamics, each of which can be customized using configurable attributes (see Appendix \ref{['app:subsystems']} for more details). These configurable settings provide a convenient method for creating adaptive difficulty, allowing for the implementation of curriculum learning techniques that gradually introduce agents to more challenging tasks during training.
  • Figure 2: Snapshots of King of the Hill (A) and Sandwich (B), showcasing the same policy's adaptability to different game settings. (A) When the resource subsystem is enabled, team members spread out to forage for food and water. (B) When the resource subsystem is disabled, each team groups together to maximize their offensive and defensive capabilities.
  • Figure 3: Training curves for the Full Config experiment. For the generalist policy, only samples from the target minigame were counted. As training progresses, agents learn to survive longer, engage with more game subsystems (Appendix \ref{['app:ext-full-config-results']}), and encounter diverse events, as evidenced by the unique event count.
  • Figure 4: Evaluations for the Full Config experiment. See Appendix \ref{['app:eval-metrics']} for methods. An Elo rating of 1000 represents the initial anchor value. Training samples of the generalist checkpoints were adjusted based on the minigame sampling ratio during training (Appendix \ref{['app:task-sample']}).
  • Figure 5: Training curves for the Mini Config experiment, showing metrics specific to each minigame. In Team Battle and Protect the King, agent lifespan increases with training. In Race to the Center and King of the Hill, agents learned to navigate maps and hold the center within 25M steps. In Sandwich, the generalist policy did not converge to the maximum NPC multiplier after 100M steps.
  • ...and 6 more figures