Training Safe Neural Networks with Global SDP Bounds
Roman Soletskyi, David "davidad" Dalrymple
TL;DR
This work tackles the challenge of training neural networks with formal safety guarantees over large input regions. It introduces an SDP-based verification framework and a novel ADMM-based training scheme to ensure the safety property $\max_{x\in X} f_\theta(x) \le B^{\mathrm{SDP}}(\theta)$, demonstrated on the Adversarial Spheres dataset. Empirical results show provably safe training up to input dimension $d=40$ with polynomial-time scaling, outperforming linear bounds in high dimensions, and revealing important limitations for $\ell_\infty$ safety. The approach offers a path toward safe neural policies and safe RL, with potential extensions to tighter global bounds and integration with SOS-based or reach-avoid certificates for high-dimensional systems.
Abstract
This paper presents a novel approach to training neural networks with formal safety guarantees using semidefinite programming (SDP) for verification. Our method focuses on verifying safety over large, high-dimensional input regions, addressing limitations of existing techniques that focus on adversarial robustness bounds. We introduce an ADMM-based training scheme for an accurate neural network classifier on the Adversarial Spheres dataset, achieving provably perfect recall with input dimensions up to $d=40$. This work advances the development of reliable neural network verification methods for high-dimensional systems, with potential applications in safe RL policies.
