Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL
Songyuan Zhang, Oswin So, Mitchell Black, Zachary Serlin, Chuchu Fan
TL;DR
This work tackles multi-agent safe optimal control with zero constraint violation by casting MASOCP into epigraph form and distributing the optimization via Def-MARL. It extends the epigraph approach to the CTDE paradigm, enabling an inner, centralized MARL optimization over a $z$-conditioned policy and a distributed outer optimization that computes the minimal safe cost upper bound online. The approach demonstrates stable training, strong safety guarantees, and competitive performance across diverse simulations and hardware experiments, significantly outperforming penalty-based and Lagrangian baselines that are sensitive to hyperparameters. This framework offers a practical route to safe, scalable coordination in real-world multi-robot systems without sacrificing performance.
Abstract
Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constraint violation. While many safe multi-agent reinforcement learning (MARL) algorithms have been proposed to solve CMDPs, these algorithms suffer from unstable training in this setting. To tackle this, we use the epigraph form for constrained optimization to improve training stability and prove that the centralized epigraph form problem can be solved in a distributed fashion by each agent. This results in a novel centralized training distributed execution MARL algorithm named Def-MARL. Simulation experiments on 8 different tasks across 2 different simulators show that Def-MARL achieves the best overall performance, satisfies safety constraints, and maintains stable training. Real-world hardware experiments on Crazyflie quadcopters demonstrate the ability of Def-MARL to safely coordinate agents to complete complex collaborative tasks compared to other methods.
