Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

Nathaniel Hamilton; Kyle Dunlap; Taylor T Johnson; Kerianne L Hobbs

Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

Nathaniel Hamilton, Kyle Dunlap, Taylor T Johnson, Kerianne L Hobbs

TL;DR

The paper tackles the problem of evaluating Safe Reinforcement Learning by rigorously analyzing how Run Time Assurance (RTA) affects RL training and performance. It conducts a large-scale ablation study (880 agents, 88 configurations) across Pendulum and Spacecraft Docking (2D/3D) to compare four RTA monitoring approaches and six training configurations for both PPO (on-policy) and SAC (off-policy). Key contributions include establishing evaluation best practices, identifying baseline punishment and RTA punishment as consistently effective, showing explicit simplex as the most reliable RTA approach, and highlighting that reward shaping often matters more than safe exploration. The work provides a practical framework for fair SRL comparisons and offers guidance for deploying safe RL in real-world cyber-physical systems.

Abstract

Reinforcement Learning (RL) has become an increasingly important research area as the success of machine learning algorithms and methods grows. To combat the safety concerns surrounding the freedom given to RL agents while training, there has been an increase in work concerning Safe Reinforcement Learning (SRL). However, these new and safe methods have been held to less scrutiny than their unsafe counterparts. For instance, comparisons among safe methods often lack fair evaluation across similar initial condition bounds and hyperparameter settings, use poor evaluation metrics, and cherry-pick the best training runs rather than averaging over multiple random seeds. In this work, we conduct an ablation study using evaluation best practices to investigate the impact of run time assurance (RTA), which monitors the system state and intervenes to assure safety, on effective learning. By studying multiple RTA approaches in both on-policy and off-policy RL algorithms, we seek to understand which RTA methods are most effective, whether the agents become dependent on the RTA, and the importance of reward shaping versus safe exploration in RL agent training. Our conclusions shed light on the most promising directions of SRL, and our evaluation methodology lays the groundwork for creating better comparisons in future SRL work.

Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

TL;DR

Abstract

Paper Structure (36 sections, 26 equations, 25 figures, 22 tables)

This paper contains 36 sections, 26 equations, 25 figures, 22 tables.

Introduction
Deep Reinforcement Learning
Safe Reinforcement Learning
Reward Shaping
Safe Exploration
Adversarial Training/Retraining
Run Time Assurance
Experiments
Run Time Assurance Configurations
(1) Baseline (no RTA)
(2) Baseline punishment
(3) RTA no punishment
(4) RTA punishment
(5) RTA Corrected Action
(6) Neural Simplex Architecture (NSA)
...and 21 more sections

Figures (25)

Figure 1: DRL training interactions between agent and environment without RTA
Figure 2: DRL training interactions between the agent and the environment with RTA.
Figure 3: The RTA configurations used in our experiments represented within the three main categories of Safe Reinforcement Learning outlined in Section \ref{['sec:SRL']}.
Figure 4: Results collected from experiments run in the 2D Spacecraft Docking environment with an implicit simplex RTA. Each curve represents the average of 10 trials, and the shaded region is the $95\%$ confidence interval about the mean. The large difference in return and success that is recorded with (a & b) and without (c & d) RTA shows that all agents trained with RTA learned to depend on it.
Figure 5: Results collected from experiments run in the 2D (a & b) and 3D (c & d) Spacecraft Docking environment with an explicit simplex RTA. Each curve represents the average of 10 trials, and the shaded region is the $95\%$ confidence interval about the mean. All plots show the baseline punishment and RTA punishment configurations learn at a similar rate and converge to similar levels of success and return.
...and 20 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

TL;DR

Abstract

Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (25)

Theorems & Definitions (2)