Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

Songyuan Zhang; Chuchu Fan

Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

Songyuan Zhang, Chuchu Fan

TL;DR

The paper tackles stabilizing high-dimensional unknown dynamical systems where full dynamics are unavailable or impractical to model. It introduces LYGE, a framework that learns local dynamics, a neural Control Lyapunov Function, and a stabilizing controller while Lyapunov-guided exploration selectively collects data within a growing trusted region toward the goal, ensuring stability within convergence. Empirically, LYGE achieves comparable or better stabilization than RL/IL baselines across six environments, notably reducing sample requirements by roughly 68%–95%, and scales to complex 16D/4D F-16 models. The approach also extends to learning alternative certificates, such as Control Contraction Metrics, highlighting its generality for stabilizing unknown systems where dynamics are partially learned. Overall, LYGE offers a practical, scalable route to data-efficient stabilization without requiring a full dynamical model, with promising avenues for formal verification and certificate-based extensions.

Abstract

Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. This paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems.

Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

TL;DR

Abstract

Paper Structure (45 sections, 6 theorems, 45 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 45 sections, 6 theorems, 45 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Control Lyapunov Functions.
Reinforcement Learning (RL) and Optimal Control.
Imitation Learning (IL).
Problem Setting and Preliminaries
LYGE Algorithm
Experiments
Baselines
Environments
Results and Discussions
Ablation Studies
Extensions
Conclusion
Control Lyapunov Functions
...and 30 more sections

Key Result

proposition 1

Given a set $\mathcal{G}\subset\mathcal{X}$ such that $x_\mathrm{goal}\in\mathcal{G}$. Suppose there exists a CLF $V$ on $\mathcal{G}\subseteq\mathcal{X}$ with a constant $\lambda\in (0, 1)$. If $\mathcal{G}$ is forward invariantA set $\mathcal{G}$ is forward invariant for eq:dynamics if $x(0)\in\ma

Figures (7)

Figure 1: Trajectories generated by LYGE in different iterations in inverted pendulum environment. The counters show the learned CLF. The orange trajectories are generated by the learned controller in the current iteration. The light orange dots are the demonstrations generated in previous iterations, which also indicate the trusted tunnel $\mathcal{H}$. The black dot is the goal.
Figure 2: The distance to the goal w.r.t. time step of LYGE and the baselines: (a) Inv Pendulum; (b) Cart Pole; (c) Cart II Pole; (d) Neural Lander; (e) F-16 GCA; (f) F-16 Tracking. The solid lines show the mean distance while the shaded regions show the standard deviation. Note that the curve corresponding to CLF-sparse almost overlaps with the curve corresponding to CLF-dense and so, the curve for CLF-sparse might not be visible in some of the plots.
Figure 3: Ablation studies. (a) The converged reward w.r.t. demonstration rewards. (b) The number of samples used before LYGE converges w.r.t. demonstration rewards. (c) The converged reward w.r.t. $\epsilon$. (d) The converged reward w.r.t. $\lambda$. (e) The converged reward w.r.t. $\eta_\mathrm{ctrl}$.
Figure 4: The tracking error w.r.t. time step in Dubins car path tracking environment.
Figure 5: The expected return of LYGE and the baselines with respect to the number of samples. The dashed red line shows the converged reward of LYGE. In the right of each subplot, we show the whole curve of the expected return w.r.t. the number of samples, and since our LYGE converges too fast compared with the baselines, we zoom in the region inside the dashed rectangle and show this region in the left. The triangle on the x-axis shows the number of samples needed by LYGE to converge. The reward at $0$ number of samples is the reward of demonstrations.
...and 2 more figures

Theorems & Definitions (15)

definition 1
proposition 1
proof
definition 2
definition 3
lemma 1
proof
lemma 2
proof
theorem 1
...and 5 more

Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

TL;DR

Abstract

Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided Exploration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (15)