Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems
Katherine Dearstyne, Pedro, Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang
TL;DR
The paper addresses the challenge of safety assurance for reinforcement learning in cyber-physical systems by introducing SAFE-RL, a design-science–driven safety and accountability framework. SAFE-RL structures safety evaluation as a hierarchical, question-driven assessment using four node types (goals, subgoals, ASK nodes, evidence) and maps subgoals to robustness, generalizability, safety, and transparency, enabling traceable decision evidence. The framework is demonstrated through three deep-RL use cases in small unmanned aerial systems, highlighting real-world deployment considerations such as anomaly detection, real-time risk detection, and human oversight. The work offers a practical, extensible approach to RL safety in CPS, with future directions including expanding safety dimensions and applying SAFE-RL to broader AI domains to support practitioners.
Abstract
Cyber-Physical Systems (CPS) often leverage Reinforcement Learning (RL) techniques to adapt dynamically to changing environments and optimize performance. However, it is challenging to construct safety cases for RL components. We therefore propose the SAFE-RL (Safety and Accountability Framework for Evaluating Reinforcement Learning) for supporting the development, validation, and safe deployment of RL-based CPS. We adopt a design science approach to construct the framework and demonstrate its use in three RL applications in small Uncrewed Aerial systems (sUAS)
