Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation
Murad Dawood, Sicong Pan, Nils Dengler, Siqi Zhou, Angela P. Schoellig, Maren Bennewitz
TL;DR
This work tackles safe behavior-based cooperative navigation with a team of $N$ robots by steering the formation centroid to a target and maintaining inter-robot distances without per-robot targets. It pairs a centralized SAC-based MARL framework with attention-based critics and a distributed NMPC safety filter to override unsafe actions, ensuring zero collisions during training and execution. The approach demonstrates faster convergence, robust zero-collision performance in simulation and real robots, and safe transfer to unseen configurations, while revealing the MPC layer’s role in enabling exploration and safety. Overall, the method advances practical deployment of safe MARL for scalable, centroid-based formation control in real-world robotic teams.
Abstract
In this paper, we address the problem of behavior-based cooperative navigation of mobile robots using safe multi-agent reinforcement learning~(MARL). Our work is the first to focus on cooperative navigation without individual reference targets for the robots, using a single target for the formation's centroid. This eliminates the complexities involved in having several path planners to control a team of robots. To ensure safety, our MARL framework uses model predictive control (MPC) to prevent actions that could lead to collisions during training and execution. We demonstrate the effectiveness of our method in simulation and on real robots, achieving safe behavior-based cooperative navigation without using individual reference targets, with zero collisions, and faster target reaching compared to baselines. Finally, we study the impact of MPC safety filters on the learning process, revealing that we achieve faster convergence during training and we show that our approach can be safely deployed on real robots, even during early stages of the training.
