Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments
Alexander W. Goodall, Francesco Belardinelli
TL;DR
The paper tackles safe reinforcement learning in continuous environments by extending Approximate Model-based Shielding (AMBS) to continuous state and action spaces and evaluating it on Safety Gym with DreamerV3 as the world model. It introduces three penalty-based gradient modification techniques—PENL, PLPG, and COPT—to inject safety considerations into policy optimization while avoiding the drawbacks of rejection-based shielding. The authors establish probabilistic safety guarantees for the continuous setting via sample complexity bounds under full and partial observability, and demonstrate dramatic reductions in safety violations across multiple Safety Gym tasks, albeit with slower convergence than some baselines. This work advances practical safe RL by enabling tunable safety guarantees and improved stability in continuous domains, which is critical for real-world deployment of model-based RL systems.
Abstract
Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.
