Predictive Safety Shield for Dyna-Q Reinforcement Learning
Jin Pin, Krasowski Hanna, Vanneaux Elena
TL;DR
This work tackles the challenge of achieving hard safety guarantees in reinforcement learning by introducing a predictive safety shield for discrete-space, model-based RL, specifically integrated with Dyna-Q. The shield uses a safety-relevant environment model to perform multi-step planning and updates a local $Q$-function $Q_W$ to bias action selection toward safe, high-return trajectories, addressing sim-to-real gaps without retraining. The authors prove optimality under full observability with static obstacles and demonstrate, in gridworld experiments, that short horizons ($N$ small) can yield near-optimal solutions and robust performance under distribution shifts. This approach enables safer, more reliable RL in discrete domains and offers a foundation for extending to continuous spaces and more dynamic environments in future work.
Abstract
Obtaining safety guarantees for reinforcement learning is a major challenge to achieve applicability for real-world tasks. Safety shields extend standard reinforcement learning and achieve hard safety guarantees. However, existing safety shields commonly use random sampling of safe actions or a fixed fallback controller, therefore disregarding future performance implications of different safe actions. In this work, we propose a predictive safety shield for model-based reinforcement learning agents in discrete space. Our safety shield updates the Q-function locally based on safe predictions, which originate from a safe simulation of the environment model. This shielding approach improves performance while maintaining hard safety guarantees. Our experiments on gridworld environments demonstrate that even short prediction horizons can be sufficient to identify the optimal path. We observe that our approach is robust to distribution shifts, e.g., between simulation and reality, without requiring additional training.
