Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form

Xuefeng Wang; Lei Zhang; Henglin Pu; Husheng Li; Ahmed H. Qureshi

Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form

Xuefeng Wang, Lei Zhang, Henglin Pu, Husheng Li, Ahmed H. Qureshi

TL;DR

This work proposes a continuous-time constrained MDP (CT-CMDP) formulation and a novel MARL framework that transforms discrete MDPs into CT-CMDPs via an epigraph-based reformulation, and proposes a novel physics-informed neural network (PINN)-based actor-critic method that enables stable and efficient optimization in continuous time.

Abstract

Multi-agent reinforcement learning (MARL) has made significant progress in recent years, but most algorithms still rely on a discrete-time Markov Decision Process (MDP) with fixed decision intervals. This formulation is often ill-suited for complex multi-agent dynamics, particularly in high-frequency or irregular time-interval settings, leading to degraded performance and motivating the development of continuous-time MARL (CT-MARL). Existing CT-MARL methods are mainly built on Hamilton-Jacobi-Bellman (HJB) equations. However, they rarely account for safety constraints such as collision penalties, since these introduce discontinuities that make HJB-based learning difficult. To address this challenge, we propose a continuous-time constrained MDP (CT-CMDP) formulation and a novel MARL framework that transforms discrete MDPs into CT-CMDPs via an epigraph-based reformulation. We then solve this by proposing a novel physics-informed neural network (PINN)-based actor-critic method that enables stable and efficient optimization in continuous time. We evaluate our approach on continuous-time safe multi-particle environments (MPE) and safe multi-agent MuJoCo benchmarks. Results demonstrate smoother value approximations, more stable training, and improved performance over safe MARL baselines, validating the effectiveness and robustness of our method.

Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form

TL;DR

Abstract

Paper Structure (38 sections, 4 theorems, 86 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 38 sections, 4 theorems, 86 equations, 15 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Continuous-Time Reinforcement Learning
Multi-agent systems with Safety Concerns
Methodology
Problem Formulation
Continuous-time Constrained Markov Decision Process
Epigraph Reformulation
Epigraph Learning Framework
Revised Outer Optimization
Inner Optimization with Critic Learning
Actor Learning
Experimental Results
Benchmarks and baselines.
Results Analysis
...and 23 more sections

Key Result

Lemma 3.1

Suppose the assumptions in Sec. sec:cmdp hold. For all $(t,x,z)\in[0,\infty)\times\mathcal{X}\times\mathbb{R}$, the constrained value $v$ and auxiliary value $V$ are related by

Figures (15)

Figure 1: Overview of the proposed epigraph-based CT-MARL framework. The pipeline begins with data collection, where individual agent rollouts are aggregated into a centralized rollout $\mathcal{X}_R$ for the training; the outer optimization computes optimal $z^*$ to balance discounted cumulative cost and safety constraints; the inner optimization corresponds to critic learning, where return networks $V^{\text{ret}}_\psi(x)$ and constraint value networks $V^{\text{cons}}_\phi(x)$ are optimized jointly with the optimal auxiliary state $z^*$; and actor learning leverages the advantage function to improve policies.
Figure 2: Overall results for adapted MPE environments.
Figure 3: Performance of constraints and cost over MPE settings.
Figure 4: Overall results for adapted multi-agent MuJoCo environments.
Figure 5: Ablation study of different loss terms in critic network over MPE.
...and 10 more figures

Theorems & Definitions (10)

Definition 1: Epigraph Reformulation
Lemma 3.1: Value Equivalence
Lemma 3.2: Optimality Condition
Theorem 3.3: Epigraph-based HJB PDE
Definition 2: Epigraph-based Q-function
Lemma 3.4: Epigraph-based advantage function
proof
proof
proof
proof

Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form

TL;DR

Abstract

Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (10)