Table of Contents
Fetching ...

Data-driven Practical Stabilization of Nonlinear Systems via Chain Policies: Sample Complexity and Incremental Learning

Roy Siegelmann, Enrique Mallada

TL;DR

The paper addresses data-driven stabilization of nonlinear Lipschitz systems by introducing Nonparametric Chain Policies (NCPs) that select finite-duration controls from a stored trajectory library via a normalized nearest-neighbor rule. Central to the guarantees are Recurrent Control Lyapunov Functions (R-CLFs), which enable certificates of practical exponential stabilization without full Lyapunov function construction. The authors establish explicit sample-complexity bounds scaling as $O\left((3/\rho)^d \log(R/c)\right)$ and show that NCPs admit incremental learning, allowing new data to enlarge certified regions or improve convergence rates without re-optimizing. Numerical experiments on a unicycle and an inverted pendulum illustrate the method’s effectiveness, incremental learning capability, and the trade-off between convergence rate and data requirements. Overall, the work presents a scalable, data-driven framework for certified stabilization with transparent data-accuracy guarantees and progressively improvable performance.

Abstract

We propose a method for data-driven practical stabilization of nonlinear systems with provable guarantees, based on the concept of Nonparametric Chain Policies (NCPs). The approach employs a normalized nearest-neighbor rule to assign, at each state, a finite-duration control signal derived from stored data, after which the process repeats. Unlike recent works that model the system as linear, polynomial, or polynomial fraction, we only assume the system to be locally Lipschitz. Our analysis builds on the framework of Recurrent Lyapunov Functions (RLFs), which enable data-driven certification of practical stability using standard norm functions instead of requiring the explicit construction of a classical Lyapunov function. To extend this framework, we introduce the concept of Recurrent Control Lyapunov Functions (R-CLFs), which can certify the existence of an NCP that practically stabilizes an arbitrarily small c-neighborhood of an equilibrium point. We also provide an explicit sample complexity guarantee of O((3/rho)^d log(R/c)) number of trajectories, where R is the domain radius, d the state dimension, and rho a system-dependent constant. The proposed Chain Policies are nonparametric, thus allowing new verified data to be readily incorporated into the policy to either improve convergence rate or enlarge the certified region. Numerical experiments illustrate and validate these properties.

Data-driven Practical Stabilization of Nonlinear Systems via Chain Policies: Sample Complexity and Incremental Learning

TL;DR

The paper addresses data-driven stabilization of nonlinear Lipschitz systems by introducing Nonparametric Chain Policies (NCPs) that select finite-duration controls from a stored trajectory library via a normalized nearest-neighbor rule. Central to the guarantees are Recurrent Control Lyapunov Functions (R-CLFs), which enable certificates of practical exponential stabilization without full Lyapunov function construction. The authors establish explicit sample-complexity bounds scaling as and show that NCPs admit incremental learning, allowing new data to enlarge certified regions or improve convergence rates without re-optimizing. Numerical experiments on a unicycle and an inverted pendulum illustrate the method’s effectiveness, incremental learning capability, and the trade-off between convergence rate and data requirements. Overall, the work presents a scalable, data-driven framework for certified stabilization with transparent data-accuracy guarantees and progressively improvable performance.

Abstract

We propose a method for data-driven practical stabilization of nonlinear systems with provable guarantees, based on the concept of Nonparametric Chain Policies (NCPs). The approach employs a normalized nearest-neighbor rule to assign, at each state, a finite-duration control signal derived from stored data, after which the process repeats. Unlike recent works that model the system as linear, polynomial, or polynomial fraction, we only assume the system to be locally Lipschitz. Our analysis builds on the framework of Recurrent Lyapunov Functions (RLFs), which enable data-driven certification of practical stability using standard norm functions instead of requiring the explicit construction of a classical Lyapunov function. To extend this framework, we introduce the concept of Recurrent Control Lyapunov Functions (R-CLFs), which can certify the existence of an NCP that practically stabilizes an arbitrarily small c-neighborhood of an equilibrium point. We also provide an explicit sample complexity guarantee of O((3/rho)^d log(R/c)) number of trajectories, where R is the domain radius, d the state dimension, and rho a system-dependent constant. The proposed Chain Policies are nonparametric, thus allowing new verified data to be readily incorporated into the policy to either improve convergence rate or enlarge the certified region. Numerical experiments illustrate and validate these properties.

Paper Structure

This paper contains 13 sections, 6 theorems, 57 equations, 3 figures.

Key Result

Lemma 1

Let assumptions assump:forward-complete and as:Lipschitz hold. Consider an equilibrium $x^*$ of eq:control-system and a compact set $S$ satisfying $x^*\in \mathrm{int}(S)$. A function $V:\mathbb{R}^n\to \mathbb{R}_{\geq0}$ satisfying eq:RCLF-bounds is a Recurrent Control Lyapunov Function (R-CLF) ov

Figures (3)

  • Figure 1: Trajectories of Unicycle NCP. Phase plots of $(x,y)$ for eight evenly distributed points. The black icons depict the initial facing of the unicycle. Plot (a) contains trajectories from NCP trained to minimize $V_1$, which results in sharp turns, while (b) is trained to minimize $V_2$, which results in softer turns and smoother overall behavior. Plots (c) and (d) show the development of $V_1$ and $V_2$ over time respectively. Both converge exponentially to the equilibrium, with at least$\alpha = 0.01$. We have $\tau_{\max} = 5, \varepsilon = 0.01, L = 1$ and $c = \varepsilon(1+L\tau_{\max} e^{L \tau_{\max}}) \simeq 0.613$, represented by the dotted line.
  • Figure 2: Incremental Learning of Unicycle Policy. Extending the state space from the previously learned region in the $y$-direction. Subfigure (a) contains the phase plot before learning, while subfigure (b) contains the phase plot after. The new region is learned without forgetting, such that parts of the trajectory in the old region use previously designed controls.
  • Figure 3: Additional Data Refinement Facilitates Improved NCP Performance. Plot (a) contains the balls used to verify the region $(\theta, \dot{\theta)} \in (-\pi,\pi] \times [-5\pi, 5\pi]$ for the inverted pendulum. Plot (b) is a refinement of plot (a), wherein all balls were split once more and re-verified. The minimum verified rate of convergence for trajectories $\alpha$ goes from $0.003$ to $0.0145$, and the average verified $\alpha$ goes from $1.815$ to $3.149$. Plot (c) demonstrates the average norm over time of 400 sample trajectories under each schema. We have $\tau_{\max} = 1.5$, $\varepsilon = 0.01$, $L = 1$, and $c = \varepsilon(1+L\tau_{\max}e^{L\tau_{\max}}) \simeq 0.072$.

Theorems & Definitions (23)

  • Definition 1: Equilibrium Point
  • Definition 2: (Practical) Exponential Stabilizability
  • Definition 3: Containment Times
  • Definition 4: Recurrent Lyapunov Function
  • Definition 5: Reachable Tube
  • Definition 6: Recurrent Control Lyapunov Function (R-CLF)
  • Lemma 1: Characterization of R-CLF
  • proof : Proof of Lemma \ref{['lem:RCLF-charac']}
  • Lemma 2: Containment Lemma
  • Theorem 1: R-CLF Implies (Practical) Exponential Stabilizability
  • ...and 13 more