MatrixWorld: A pursuit-evasion platform for safe multi-agent coordination and autocurricula

Lijun Sun; Yu-Cheng Chang; Chao Lyu; Chin-Teng Lin; Yuhui Shi

MatrixWorld: A pursuit-evasion platform for safe multi-agent coordination and autocurricula

Lijun Sun, Yu-Cheng Chang, Chao Lyu, Chin-Teng Lin, Yuhui Shi

TL;DR

MatrixWorld introduces a safety-constrained multi-agent pursuit-evasion platform to address safety gaps in typical MARL benchmarks. It couples a formal safety-oriented action-execution model with diverse collision outcomes and resolutions, enabling unbiased feedback for safe learning. The work also frames MatrixWorld as a lightweight co-evolution and autocurriculum research environment, detailing nine pursuit-evasion variants and surveying key co-evolution concepts. Experimental results illustrate arms-race dynamics and safety considerations in adversarial learning setups, underscoring the platform's potential for studying safety, autocurricula, and coordination in multi-agent systems. Overall, MatrixWorld offers a flexible, open environment to rapidly validate ideas in safe MARL and autocurriculum research.

Abstract

Multi-agent reinforcement learning (MARL) achieves encouraging performance in solving complex tasks. However, the safety of MARL policies is one critical concern that impedes their real-world applications. Popular multi-agent benchmarks focus on diverse tasks yet provide limited safety support. Therefore, this work proposes a safety-constrained multi-agent environment: MatrixWorld, based on the general pursuit-evasion game. Particularly, a safety-constrained multi-agent action execution model is proposed for the software implementation of safe multi-agent environments based on diverse safety definitions. It (1) extends the vertex conflict among homogeneous / cooperative agents to heterogeneous / adversarial settings, and (2) proposes three types of resolutions for each type of conflict, aiming at providing rational and unbiased feedback for safe MARL. Besides, MatrixWorld is also a lightweight co-evolution framework for the learning of pursuit tasks, evasion tasks, or both, where more pursuit-evasion variants can be designed based on different practical meanings of safety. As a brief survey, we review and analyze the co-evolution mechanism in the multi-agent setting, which clearly reveals its relationships with autocurricula, self-play, arms races, and adversarial learning. Thus, MatrixWorld can also serve as the first environment for autocurricula research, where ideas can be quickly verified and well understood.

MatrixWorld: A pursuit-evasion platform for safe multi-agent coordination and autocurricula

TL;DR

Abstract

Paper Structure (20 sections, 8 figures, 4 tables, 3 algorithms)

This paper contains 20 sections, 8 figures, 4 tables, 3 algorithms.

Introduction
Brief survey on co-evolution, autocurricula, and arms races
Co-evolution
Self-play
Adaptation and arms races
Curriculum learning (CL), automatic CL, and autocurricula
Adversarial learning
MatrixWorld: A lightweight co-evolution environment
MatrixWorld: Safety-constrained multi-agent pursuit-evasion games
Multi-agent-environment interaction model
Safety-constrained collision resolution mechanism
Pursuit-evasion game variants
API
Conclusion
Acknowledgements
...and 5 more sections

Figures (8)

Figure 1: Relationships between co-evolution, self-play, autocurricula, arms races, and adversarial learning.
Figure 2: Illustration of the types of collisions and outcomes in general multi-agent interactions. First row: four types of conflicts for homogeneous / cooperative agents from gao2023reviewstern2019multi. Second row: the extension of vertex conflict to heterogeneous / adversarial settings, i.e., the collision types. Third row: the types of outcomes for each collision.
Figure 3: Safety-constrained multi-agent collision resolution mechanism for the multi-agent environment modeled by stochastic game.
Figure 4: Illustration of basic usage of MatrixWorld.
Figure 5: Training performance achieved for Pursuit-Evasion-O by Algorithms \ref{['algorithm_specialist_vs_specialist']}, \ref{['algorithm_generalist_vs_specialist']}, and \ref{['algorithm_generalist_vs_generalist']} (from top to bottom). The curves are smoothed over 30 points.
...and 3 more figures

MatrixWorld: A pursuit-evasion platform for safe multi-agent coordination and autocurricula

TL;DR

Abstract

MatrixWorld: A pursuit-evasion platform for safe multi-agent coordination and autocurricula

Authors

TL;DR

Abstract

Table of Contents

Figures (8)