Table of Contents
Fetching ...

GUARD: A Safe Reinforcement Learning Benchmark

Weiye Zhao, Yifan Sun, Feihan Li, Rui Chen, Ruixuan Liu, Tianhao Wei, Changliu Liu

TL;DR

GUARD addresses the need for a standardized, generalizable benchmark for safe reinforcement learning by introducing a Generalized Unified Safe Reinforcement Learning Development Benchmark that encompasses 11 agents, 7 locomotion tasks, and 8 safety-constraint specifications. The framework provides self-contained implementations of eight state-of-the-art on-policy safe RL algorithms (including CPO, PCPO, TRPO-Lagrangian, TRPO-FAC, TRPO-IPO, and hierarchical variants with Safety Layer and USL) built on a unified TRPO backbone in PyTorch, plus a comprehensive testing suite with 72 task-robot-constraint configurations. Across these settings, GUARD analyzes how constraint difficulty, task complexity, and algorithm design choices (e.g., adaptive multipliers, feasibility projection, and cost dynamics linearization) shape reward performance and safety outcomes, delivering baselines and insights to guide future work. By enabling fair, reproducible comparisons and providing extensible code, GUARD aims to accelerate safe RL development toward reliable real-world deployment in safety-critical domains.

Abstract

Due to the trial-and-error nature, it is typically challenging to apply RL algorithms to safety-critical real-world applications, such as autonomous driving, human-robot interaction, robot manipulation, etc, where such errors are not tolerable. Recently, safe RL (i.e. constrained RL) has emerged rapidly in the literature, in which the agents explore the environment while satisfying constraints. Due to the diversity of algorithms and tasks, it remains difficult to compare existing safe RL algorithms. To fill that gap, we introduce GUARD, a Generalized Unified SAfe Reinforcement Learning Development Benchmark. GUARD has several advantages compared to existing benchmarks. First, GUARD is a generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications. Second, GUARD comprehensively covers state-of-the-art safe RL algorithms with self-contained implementations. Third, GUARD is highly customizable in tasks and algorithms. We present a comparison of state-of-the-art safe RL algorithms in various task settings using GUARD and establish baselines that future work can build on.

GUARD: A Safe Reinforcement Learning Benchmark

TL;DR

GUARD addresses the need for a standardized, generalizable benchmark for safe reinforcement learning by introducing a Generalized Unified Safe Reinforcement Learning Development Benchmark that encompasses 11 agents, 7 locomotion tasks, and 8 safety-constraint specifications. The framework provides self-contained implementations of eight state-of-the-art on-policy safe RL algorithms (including CPO, PCPO, TRPO-Lagrangian, TRPO-FAC, TRPO-IPO, and hierarchical variants with Safety Layer and USL) built on a unified TRPO backbone in PyTorch, plus a comprehensive testing suite with 72 task-robot-constraint configurations. Across these settings, GUARD analyzes how constraint difficulty, task complexity, and algorithm design choices (e.g., adaptive multipliers, feasibility projection, and cost dynamics linearization) shape reward performance and safety outcomes, delivering baselines and insights to guide future work. By enabling fair, reproducible comparisons and providing extensible code, GUARD aims to accelerate safe RL development toward reliable real-world deployment in safety-critical domains.

Abstract

Due to the trial-and-error nature, it is typically challenging to apply RL algorithms to safety-critical real-world applications, such as autonomous driving, human-robot interaction, robot manipulation, etc, where such errors are not tolerable. Recently, safe RL (i.e. constrained RL) has emerged rapidly in the literature, in which the agents explore the environment while satisfying constraints. Due to the diversity of algorithms and tasks, it remains difficult to compare existing safe RL algorithms. To fill that gap, we introduce GUARD, a Generalized Unified SAfe Reinforcement Learning Development Benchmark. GUARD has several advantages compared to existing benchmarks. First, GUARD is a generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications. Second, GUARD comprehensively covers state-of-the-art safe RL algorithms with self-contained implementations. Third, GUARD is highly customizable in tasks and algorithms. We present a comparison of state-of-the-art safe RL algorithms in various task settings using GUARD and establish baselines that future work can build on.
Paper Structure (54 sections, 12 equations, 40 figures, 8 tables)

This paper contains 54 sections, 12 equations, 40 figures, 8 tables.

Figures (40)

  • Figure 1: Robots of our environments.
  • Figure 2: Tasks of our environments.
  • Figure 3: Constraints of our environments.
  • Figure 4: constraint difficulty ablation study with Goal_Point_8{Constraint}
  • Figure 5: Comparison of results from four representative tasks. (a) to (d) cover four robots on the goal task. (e) shows the performance of a task with ghosts. (f) to (h) cover three different tasks with the point robot.
  • ...and 35 more figures

Theorems & Definitions (1)

  • Remark 1