Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

Alexander W. Goodall; Francesco Belardinelli

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

Alexander W. Goodall, Francesco Belardinelli

TL;DR

The paper tackles safe reinforcement learning in continuous environments by extending Approximate Model-based Shielding (AMBS) to continuous state and action spaces and evaluating it on Safety Gym with DreamerV3 as the world model. It introduces three penalty-based gradient modification techniques—PENL, PLPG, and COPT—to inject safety considerations into policy optimization while avoiding the drawbacks of rejection-based shielding. The authors establish probabilistic safety guarantees for the continuous setting via sample complexity bounds under full and partial observability, and demonstrate dramatic reductions in safety violations across multiple Safety Gym tasks, albeit with slower convergence than some baselines. This work advances practical safe RL by enabling tunable safety guarantees and improved stability in continuous domains, which is critical for real-world deployment of model-based RL systems.

Abstract

Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

TL;DR

Abstract

Paper Structure (38 sections, 7 theorems, 40 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 38 sections, 7 theorems, 40 equations, 9 figures, 4 tables, 2 algorithms.

Introduction
Contributions
Preliminaries
Problem Setup
Bounded Safety
Approximate Model-based Shielding
Safety Guarantees
Full Observability
Partial Observability
Penalty Techniques
Penalty Critic (PENL)
Probabilistic Logic Policy Gradient (PLPG)
Counter-example Guided Policy Optimisation (COPT)
Experimental Results
Safety Gym
...and 23 more sections

Key Result

Theorem 1

Let $\epsilon > 0$, $\delta > 0$, $s \in S$ be given. With access to the true transition system $\mathcal{T}$, with probability $1 - \delta$ we can obtain an $\epsilon$-approximate estimate of the measure $\mu_{s\models\phi}$, by sampling $m$ traces $\tau \sim \mathcal{T}$, provided that,

Figures (9)

Figure 1: A simple example in Safety Gym ray2019benchmarking. The task policy proposes actions along the optimal trajectory. However, this trajectory enters an unsafe region and so the shield overrides these actions with "Break!" actions proposed by the safe policy. As a result, the safe trajectory is not followed and the two policies continuously fight for control.
Figure 2: POMDP with Labels.
Figure 3: SafetyGym environments.
Figure 4: Episode return (left) and cumulative violations (right) for PointGoal1, PointGoal2 and CarGoal1.
Figure 5: Long run (10M frames) episode return (left) and cumulative violations (right) for PointGoal1.
...and 4 more figures

Theorems & Definitions (7)

Theorem 1
Theorem 2
Theorem 3
Theorem \ref{prop:boundonm} (restated)
Lemma 1: error amplification
Theorem \ref{prop:kl} (restated)
Theorem \ref{prop:pomdp} Restated

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

TL;DR

Abstract

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)