Lyapunov Neural ODE State-Feedback Control Policies
Joshua Hang Sai Ip, Georgios Makrygiorgos, Ali Mesbah
TL;DR
This work tackles solving constrained continuous-time OCPs by learning a state-feedback policy within a Neural ODE framework while guaranteeing stability. It introduces Lyapunov-NODE control (L-NODEC), which embeds an exponentially-stabilizing control Lyapunov function (ES-CLF) into a Lyapunov loss that enforces $\frac{\rd V}{\rd x}^\top \mathcal{F}_{\theta}(x,t)+\kappa V(x)\le 0$ and yields exponential convergence $\|x(t)-z\|_P \le e^{-\kappa t/2}\|x_0-z\|_P$ when the loss vanishes. The authors prove that zero Lyapunov loss implies ES-CLF and provide an adversarial robustness bound for initial-state perturbations, plus a learning framework that handles state and input constraints through penalties. They demonstrate two case studies—Double Integrator and thermal-dose delivery in plasma medicine—showing faster target attainment and improved robustness compared to NODEC, highlighting practical potential for safety-critical, constraint-bound control tasks.
Abstract
Deep neural networks are increasingly used as an effective parameterization of control policies in various learning-based control paradigms. For continuous-time optimal control problems (OCPs), which are central to many decision-making tasks, control policy learning can be cast as a neural ordinary differential equation (NODE) problem wherein state and control constraints are naturally accommodated. This paper presents a NODE approach to solving continuous-time OCPs for the case of stabilizing a known constrained nonlinear system around a target state. The approach, termed Lyapunov-NODE control (L-NODEC), uses a novel Lyapunov loss formulation that incorporates an exponentially-stabilizing control Lyapunov function to learn a state-feedback neural control policy, bridging the gap of solving continuous-time OCPs via NODEs with stability guarantees. The proposed Lyapunov loss allows L-NODEC to guarantee exponential stability of the controlled system, as well as its adversarial robustness to perturbations to the initial state. The performance of L-NODEC is illustrated in two problems, including a dose delivery problem in plasma medicine. In both cases, L-NODEC effectively stabilizes the controlled system around the target state despite perturbations to the initial state and reduces the inference time necessary to reach the target.
