Learning Verifiable Control Policies Using Relaxed Verification
Puja Chaudhury, Alexander Estornell, Michael Everett
TL;DR
The paper tackles safety guarantees for learning-based control by integrating differentiable reachability bounds into the training loop, using LiRPA/CROWN to produce tight, runtime-checkable guarantees. By optimizing a composite loss that encodes reach-avoid and invariance properties over a horizon $T$, the method guides neural policies toward verifiable safety specifications. Empirical results on a 2D unicycle and a 6D quadrotor show that policies trained with these relaxed, differentiable bounds can both reach goals and avoid obstacles, while yielding tight bound estimates suitable for lightweight verification at runtime. The approach promises safer deployment of learning-based controllers by combining training-time verification with scalable bound computation. Future work includes extending scalability, more challenging specifications, and integration with imitation learning for demonstrations.
Abstract
To provide safety guarantees for learning-based control systems, recent work has developed formal verification methods to apply after training ends. However, if the trained policy does not meet the specifications, or there is conservatism in the verification algorithm, establishing these guarantees may not be possible. Instead, this work proposes to perform verification throughout training to ultimately aim for policies whose properties can be evaluated throughout runtime with lightweight, relaxed verification algorithms. The approach is to use differentiable reachability analysis and incorporate new components into the loss function. Numerical experiments on a quadrotor model and unicycle model highlight the ability of this approach to lead to learned control policies that satisfy desired reach-avoid and invariance specifications.
