Adaptive Shielding for Safe Reinforcement Learning under Hidden-Parameter Dynamics Shifts
Minjae Kwon, Tyler Ingebrand, Ufuk Topcu, Lu Feng
TL;DR
This work introduces safety-regularized optimization that proactively trains the policy away from high-cost regions and proves that prediction errors in the shielding connect with bounds on the average cost rate, and proves that prediction errors in the shielding connect with bounds on the average cost rate.
Abstract
Unseen shifts in environment dynamics, driven by hidden parameters such as friction or gravity, create a challenge for maintaining safety. We address this challenge by proposing Adaptive Shielding, a framework for safe reinforcement learning in constrained hidden-parameter Markov decision processes. A function encoder infers a low-dimensional representation of the underlying dynamics online from transition data, allowing the shield to adapt. To ensure safety during this process, we use a two-layer strategy. First, we introduce safety-regularized optimization that proactively trains the policy away from high-cost regions. Second, the adaptive shielding reactively uses the inferred dynamics to forecast safety risks and applies uncertainty-aware bounds using conformal prediction to filter unsafe actions. We prove that prediction errors in the shielding connect with bounds on the average cost rate. Empirically, across Safe-Gym benchmarks with varying hidden parameters, our approach outperforms baselines on the return-safety trade-off and generalizes reliably to unseen dynamics, while incurring only modest execution-time overhead. Code is available at https://github.com/safe-autonomy-lab/AdaptiveShieldingFE.
