Table of Contents
Fetching ...

EigenSafe: A Spectral Framework for Learning-Based Probabilistic Safety Assessment

Inkyu Jang, Jonghae Park, Sihyun Cho, Chams E. Mballo, Claire J. Tomlin, H. Jin Kim

TL;DR

EigenSafe introduces an operator-theoretic framework that recasts long-horizon safety as the action of a linear operator on probability functions. The dominant eigenpair $(\gamma_\pi, \phi_\pi)$, along with the state-action pair counterpart $(\gamma_\pi, \psi_\pi)$, quantifies global safety decay and local safety for policy evaluation, and is learned via a power-iteration–inspired loss. The approach enables safe reinforcement learning under a spectral constraint and test-time safety filtering for imitation learning, with demonstrations on Gym environments and a UR3 manipulation task showing improved safety-performance tradeoffs. This spectral safety paradigm offers calibrated, probability-aligned safety metrics that can guide learning and deployment of learning-enabled robotic systems in uncertain environments.

Abstract

We present EigenSafe, an operator-theoretic framework for safety assessment of learning-enabled stochastic systems. In many robotic applications, the dynamics are inherently stochastic due to factors such as sensing noise and environmental disturbances, and it is challenging for conventional methods such as Hamilton-Jacobi reachability and control barrier functions to provide a well-calibrated safety critic that is tied to the actual safety probability. We derive a linear operator that governs the dynamic programming principle for safety probability, and find that its dominant eigenpair provides critical safety information for both individual state-action pairs and the overall closed-loop system. The proposed framework learns this dominant eigenpair, which can be used to either inform or constrain policy updates. We demonstrate that the learned eigenpair effectively facilitates safe reinforcement learning. Further, we validate its applicability in enhancing the safety of learned policies from imitation learning through robot manipulation experiments using a UR3 robotic arm in a food preparation task.

EigenSafe: A Spectral Framework for Learning-Based Probabilistic Safety Assessment

TL;DR

EigenSafe introduces an operator-theoretic framework that recasts long-horizon safety as the action of a linear operator on probability functions. The dominant eigenpair , along with the state-action pair counterpart , quantifies global safety decay and local safety for policy evaluation, and is learned via a power-iteration–inspired loss. The approach enables safe reinforcement learning under a spectral constraint and test-time safety filtering for imitation learning, with demonstrations on Gym environments and a UR3 manipulation task showing improved safety-performance tradeoffs. This spectral safety paradigm offers calibrated, probability-aligned safety metrics that can guide learning and deployment of learning-enabled robotic systems in uncertain environments.

Abstract

We present EigenSafe, an operator-theoretic framework for safety assessment of learning-enabled stochastic systems. In many robotic applications, the dynamics are inherently stochastic due to factors such as sensing noise and environmental disturbances, and it is challenging for conventional methods such as Hamilton-Jacobi reachability and control barrier functions to provide a well-calibrated safety critic that is tied to the actual safety probability. We derive a linear operator that governs the dynamic programming principle for safety probability, and find that its dominant eigenpair provides critical safety information for both individual state-action pairs and the overall closed-loop system. The proposed framework learns this dominant eigenpair, which can be used to either inform or constrain policy updates. We demonstrate that the learned eigenpair effectively facilitates safe reinforcement learning. Further, we validate its applicability in enhancing the safety of learned policies from imitation learning through robot manipulation experiments using a UR3 robotic arm in a food preparation task.

Paper Structure

This paper contains 39 sections, 12 theorems, 55 equations, 12 figures, 3 tables, 1 algorithm.

Key Result

Theorem A.1

If $L:\mathcal{X}\rightarrow \mathcal{X}$ is a compact linear operator on a Banach space $\mathcal{X}$, then the spectrum $\sigma(L)$ is a discrete set, with its elements being separated by a strictly positive distance from all others, except for $0$. Every nonzero element of $\sigma(L)$ is an eigen

Figures (12)

  • Figure 1: A toy example describing the meaning of the dominant eigenpair of $T_\pi$. (a) This toy example consists of a finite state space represented by square cells, and a finite action space represented by an arrow on each of them. At each time step, the system moves to the adjacent cell in the arrow direction with probability $0.6$, and moves to each of the other adjacent cells or remains in the current cell with probability $0.1$. If the system enters a gray cell or leaves the map, it is deemed unsafe and is therefore considered to have reached the unsafe state $K$. (b) The safety probability given initial conditions A, B, C, corresponding to the colored cells in (a). Note that the vertical axis is log-scale and the slopes of all three curves converge to the same value, which corresponds to the dominant eigenvalue $\gamma_\pi$. (c) The dominant eigenfunction values. It can be seen that cells with higher values correspond to safer points. For finite-state systems, one can directly perform eigendecomposition of the finite-dimensional matrix representation of $T_\pi$ to obtain the eigenpair.
  • Figure 2: The dominant eigenfunction $\psi_\pi$ of $A_\pi$, defined with respect to the same dynamics and $\pi$ as in \ref{['fig: toy for T']}. Similar to \ref{['fig: toy for T']} (c), the eigenfunction can be computed by directly eigendecomposing the matrix representation of $A_\pi$. Each square represents a state, and the numbers inside the triangles denote the values of $\psi_\pi$ for the specific state-action pair defined by the square and the triangle's direction. It can be seen that the eigenfunction assigns higher values to safer transitions.
  • Figure 3: The Gym environments used in the safe RL demonstration.
  • Figure 4: Baseline comparison results for safe RL. The horizontal axis denotes the number of steps taken until a safety failure or the the agent has reached the maximum episode limit, while the vertical axis represents the total reward accumulated throughout the episode, regardless of the safety outcome. EigenSafe consistently appears in the upper-right region across all environments tested, indicating a better balance between reward and safety. Squares indicate mean values, and error bars denote maximum and minimum values over the rollouts. Gray vertical bars indicate the maximum episode length. Each performance is evaluated over five rollout episodes during the final three evaluation epochs, across four training seeds.
  • Figure 5: The task for the hardware experiment in \ref{['sec: il']} is to safely pick up the bowl of almonds and pour the almonds into the pan, without spilling them or causing a collision.
  • ...and 7 more figures

Theorems & Definitions (20)

  • Theorem A.1: Spectral Theorem for Compact Operators
  • Theorem A.2: Arzelà-Ascoli
  • Theorem A.3: Compactness of $T_\pi$
  • proof
  • Corollary A.1: Continuity of Eigenfunction
  • proof
  • Theorem A.4: Krein-Rutman krein_rutman
  • Theorem A.5
  • proof
  • Corollary A.2: Strict Positivity of the Dominant Eigenfunction
  • ...and 10 more