Data-driven Practical Stabilization of Nonlinear Systems via Chain Policies: Sample Complexity and Incremental Learning
Roy Siegelmann, Enrique Mallada
TL;DR
The paper addresses data-driven stabilization of nonlinear Lipschitz systems by introducing Nonparametric Chain Policies (NCPs) that select finite-duration controls from a stored trajectory library via a normalized nearest-neighbor rule. Central to the guarantees are Recurrent Control Lyapunov Functions (R-CLFs), which enable certificates of practical exponential stabilization without full Lyapunov function construction. The authors establish explicit sample-complexity bounds scaling as $O\left((3/\rho)^d \log(R/c)\right)$ and show that NCPs admit incremental learning, allowing new data to enlarge certified regions or improve convergence rates without re-optimizing. Numerical experiments on a unicycle and an inverted pendulum illustrate the method’s effectiveness, incremental learning capability, and the trade-off between convergence rate and data requirements. Overall, the work presents a scalable, data-driven framework for certified stabilization with transparent data-accuracy guarantees and progressively improvable performance.
Abstract
We propose a method for data-driven practical stabilization of nonlinear systems with provable guarantees, based on the concept of Nonparametric Chain Policies (NCPs). The approach employs a normalized nearest-neighbor rule to assign, at each state, a finite-duration control signal derived from stored data, after which the process repeats. Unlike recent works that model the system as linear, polynomial, or polynomial fraction, we only assume the system to be locally Lipschitz. Our analysis builds on the framework of Recurrent Lyapunov Functions (RLFs), which enable data-driven certification of practical stability using standard norm functions instead of requiring the explicit construction of a classical Lyapunov function. To extend this framework, we introduce the concept of Recurrent Control Lyapunov Functions (R-CLFs), which can certify the existence of an NCP that practically stabilizes an arbitrarily small c-neighborhood of an equilibrium point. We also provide an explicit sample complexity guarantee of O((3/rho)^d log(R/c)) number of trajectories, where R is the domain radius, d the state dimension, and rho a system-dependent constant. The proposed Chain Policies are nonparametric, thus allowing new verified data to be readily incorporated into the policy to either improve convergence rate or enlarge the certified region. Numerical experiments illustrate and validate these properties.
