Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions
Harry Zhang
TL;DR
This work tackles safety and stability in deep model-based reinforcement learning with unknown dynamics by integrating a Lyapunov-constrained value function into the Safety Augmented Value Estimation from Demonstrations (SAVED) framework, yielding SALVED. The approach learns a Lyapunov neural network to produce a stabilizing terminal cost within a Learning MPC setting, enforcing both safety during exploration and a monotone decrease of the Lyapunov function along trajectories. Empirical results in simulated 4D navigation tasks show SALVED improves stability, task completion, and constraint satisfaction while maintaining sample efficiency, with trajectories that exhibit reduced variance and fewer local minima. The framework offers a practical path toward safer, more reliable deep MBRL for control under unknown dynamics, with potential extensions to physical robots and stronger stability guarantees such as asymptotic stability.
Abstract
Model-based Reinforcement Learning (MBRL) has shown many desirable properties for intelligent control tasks. However, satisfying safety and stability constraints during training and rollout remains an open question. We propose a new Model-based RL framework to enable efficient policy learning with unknown dynamics based on learning model predictive control (LMPC) framework with mathematically provable guarantees of stability. We introduce and explore a novel method for adding safety constraints for model-based RL during training and policy learning. The new stability-augmented framework consists of a neural-network-based learner that learns to construct a Lyapunov function, and a model-based RL agent to consistently complete the tasks while satisfying user-specified constraints given only sub-optimal demonstrations and sparse-cost feedback. We demonstrate the capability of the proposed framework through simulated experiments.
