Table of Contents
Fetching ...

Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion

Haoyu Li, Xiangru Zhong, Bin Hu, Huan Zhang

TL;DR

This work tackles the challenge of certifying stability for learning-based neural controllers in continuous-time systems by jointly learning a controller and a Lyapunov certificate. It introduces a two-stage training pipeline anchored in Zubov’s ROA characterization: (i) a Zubov-guided ROA estimation stage with dynamic curriculum-driven domain expansion, followed by (ii) a CEGIS-based refinement that eliminates counterexamples inside the ROA for verifiable stabilization. The verification component extends the α,β-CROWN framework to handle Jacobian operators in continuous time and employs adaptive region bounds to avoid costly bisection, achieving substantial speedups over SMT-based solvers. Empirically, the approach yields ROAs up to five orders of magnitude larger than baselines and accelerates verification by up to four orders of magnitude on continuous-time benchmarks, with results supported by multiple seeds and ablation studies. The work advances scalable, certifiable stabilization for nonlinear systems and provides open-source code for reproducibility.

Abstract

Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimates of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their frameworks. In this work, we propose a novel two-stage training framework to jointly synthesize a controller and a Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training-data sampling strategy and a domain-updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $α,\!β$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10{,}000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.

Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion

TL;DR

This work tackles the challenge of certifying stability for learning-based neural controllers in continuous-time systems by jointly learning a controller and a Lyapunov certificate. It introduces a two-stage training pipeline anchored in Zubov’s ROA characterization: (i) a Zubov-guided ROA estimation stage with dynamic curriculum-driven domain expansion, followed by (ii) a CEGIS-based refinement that eliminates counterexamples inside the ROA for verifiable stabilization. The verification component extends the α,β-CROWN framework to handle Jacobian operators in continuous time and employs adaptive region bounds to avoid costly bisection, achieving substantial speedups over SMT-based solvers. Empirically, the approach yields ROAs up to five orders of magnitude larger than baselines and accelerates verification by up to four orders of magnitude on continuous-time benchmarks, with results supported by multiple seeds and ablation studies. The work advances scalable, certifiable stabilization for nonlinear systems and provides open-source code for reproducibility.

Abstract

Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimates of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their frameworks. In this work, we propose a novel two-stage training framework to jointly synthesize a controller and a Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training-data sampling strategy and a domain-updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier -CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume times larger compared to the baselines, and our verification on continuous systems can be up to times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.

Paper Structure

This paper contains 27 sections, 4 theorems, 29 equations, 7 figures, 6 tables, 3 algorithms.

Key Result

Theorem 2.2

Given a forward invariant set $\mathcal{S}$ containing the equilibrium, if there exists a function $V: \mathcal{S} \to \mathbb{R}$ such that we have then $\mathcal{S}$ is a subset of the region of attraction.

Figures (7)

  • Figure 1: An illustration of the key idea used in the ROA estimation stage. Notice the balanced training sample selection from both in and outside of the ROA estimation $V^{\leq c}$. The training domain is also dynamically expanded to include all the states along a convergent trajectory.
  • Figure 2: Region‑of‑Attraction (ROA) for Double Integrator and Cartpole compared to baselines. The dotted box shows the baseline's training box being used. For the double integrator system, it is clear that the initial training box is limiting the baseline's potential but is not constraining us.
  • Figure 3: Slice of the ROA for the 6D benchmark systems compared to baselines.
  • Figure 4: ROA estimation for 3D quadrotor. As we can see from the flow map, all the trajectories in our ROA estimation converges to the origin.
  • Figure 5: $\frac{\mathrm{d}}{\mathrm{d}x} \tanh(x)$ (left) and $\frac{\mathrm{d}}{\mathrm{d}x}\mathrm{sigmoid}(x)$ (right) have similar monotonicity and convexity/concavity properties.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 2.1: Region of Attraction (ROA) khalil2002nonlinear
  • Theorem 2.2: Lyapunov Theorem lyapunov1992generalkhalil2002nonlinear
  • Theorem 2.3: Zubov Theorem zubov1961methodsliu2025physicskang2023data
  • Theorem 2.4: Liu et al. liu2025physics
  • Theorem 3.1
  • proof