Table of Contents
Fetching ...

Physics-informed RL for Maximal Safety Probability Estimation

Hikaru Hoshino, Yorie Nakahira

TL;DR

The paper tackles estimating the long-horizon safety probability under maximally safe actions for stochastic systems with rare unsafe events. It introduces Physics-Informed Reinforcement Learning (PIRL) that converts the multiplicative objective to an additive cost via an augmented state and derives a PDE/HJB constraint to enforce physics-informed learning through PINNs. The approach enables learning from sparse rewards and generalization to longer horizons, demonstrated via a DQN extended with a PINN loss, showing improved safety performance over conventional methods. This framework reduces data requirements, mitigates conservatism in risk estimation, and enhances safe exploration for applications such as autonomous systems and safe RL.

Abstract

Accurate risk quantification and reachability analysis are crucial for safe control and learning, but sampling from rare events, risky states, or long-term trajectories can be prohibitively costly. Motivated by this, we study how to estimate the long-term safety probability of maximally safe actions without sufficient coverage of samples from risky states and long-term trajectories. The use of maximal safety probability in control and learning is expected to avoid conservative behaviors due to over-approximation of risk. Here, we first show that long-term safety probability, which is multiplicative in time, can be converted into additive costs and be solved using standard reinforcement learning methods. We then derive this probability as solutions of partial differential equations (PDEs) and propose Physics-Informed Reinforcement Learning (PIRL) algorithm. The proposed method can learn using sparse rewards because the physics constraints help propagate risk information through neighbors. This suggests that, for the purpose of extracting more information for efficient learning, physics constraints can serve as an alternative to reward shaping. The proposed method can also estimate long-term risk using short-term samples and deduce the risk of unsampled states. This feature is in stark contrast with the unconstrained deep RL that demands sufficient data coverage. These merits of the proposed method are demonstrated in numerical simulation.

Physics-informed RL for Maximal Safety Probability Estimation

TL;DR

The paper tackles estimating the long-horizon safety probability under maximally safe actions for stochastic systems with rare unsafe events. It introduces Physics-Informed Reinforcement Learning (PIRL) that converts the multiplicative objective to an additive cost via an augmented state and derives a PDE/HJB constraint to enforce physics-informed learning through PINNs. The approach enables learning from sparse rewards and generalization to longer horizons, demonstrated via a DQN extended with a PINN loss, showing improved safety performance over conventional methods. This framework reduces data requirements, mitigates conservatism in risk estimation, and enhances safe exploration for applications such as autonomous systems and safe RL.

Abstract

Accurate risk quantification and reachability analysis are crucial for safe control and learning, but sampling from rare events, risky states, or long-term trajectories can be prohibitively costly. Motivated by this, we study how to estimate the long-term safety probability of maximally safe actions without sufficient coverage of samples from risky states and long-term trajectories. The use of maximal safety probability in control and learning is expected to avoid conservative behaviors due to over-approximation of risk. Here, we first show that long-term safety probability, which is multiplicative in time, can be converted into additive costs and be solved using standard reinforcement learning methods. We then derive this probability as solutions of partial differential equations (PDEs) and propose Physics-Informed Reinforcement Learning (PIRL) algorithm. The proposed method can learn using sparse rewards because the physics constraints help propagate risk information through neighbors. This suggests that, for the purpose of extracting more information for efficient learning, physics constraints can serve as an alternative to reward shaping. The proposed method can also estimate long-term risk using short-term samples and deduce the risk of unsampled states. This feature is in stark contrast with the unconstrained deep RL that demands sufficient data coverage. These merits of the proposed method are demonstrated in numerical simulation.
Paper Structure (11 sections, 2 theorems, 59 equations, 4 figures, 1 algorithm)

This paper contains 11 sections, 2 theorems, 59 equations, 4 figures, 1 algorithm.

Key Result

Proposition 1

Consider the system eq:augmented_dynamics starting from an initial state $s = [\tau, x^\top]^\top \in \mathcal{S}$ and the reward function $r: \mathcal{S} \to \mathbb{R}$ given by with $\mathcal{G} := [0, \Delta t)$. Then, for a given control policy $u$, the value function $v^{u}$ defined by takes a value in $[0,1]$ and is equivalent to the safe probability $\Psi^u(\tau, x)$, i.e.,

Figures (4)

  • Figure 1: Safety probability for the outlook horizon of $\tau=2.0$.
  • Figure 2: Learning progress.
  • Figure 3: Comparison with reward shaping.
  • Figure 4: Results of generalization with $\tau_\mathrm{D} \le \tau$.

Theorems & Definitions (8)

  • Remark 1
  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Remark 2
  • proof
  • proof