Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

Minghao Han; Lixian Zhang; Chenliang Liu; Zhipeng Zhou; Jun Wang; Wei Pan

Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

Minghao Han, Lixian Zhang, Chenliang Liu, Zhipeng Zhou, Jun Wang, Wei Pan

TL;DR

A probabilistic stability theorem is proposed that ensures mean square stability using only a finite number of sampled trajectories, and a policy gradient theorem for stabilizing policy learning is derived for stabilizing policy learning.

Abstract

This paper presents a novel approach to reinforcement learning (RL) for control systems that provides probabilistic stability guarantees using finite data. Leveraging Lyapunov's method, we propose a probabilistic stability theorem that ensures mean square stability using only a finite number of sampled trajectories. The probability of stability increases with the number and length of trajectories, converging to certainty as data size grows. Additionally, we derive a policy gradient theorem for stabilizing policy learning and develop an RL algorithm, L-REINFORCE, that extends the classical REINFORCE algorithm to stabilization problems. The effectiveness of L-REINFORCE is demonstrated through simulations on a Cartpole task, where it outperforms the baseline in ensuring stability. This work bridges a critical gap between RL and control theory, enabling stability analysis and controller design in a model-free framework with finite data.

Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

TL;DR

Abstract

Paper Structure (10 sections, 5 theorems, 38 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 10 sections, 5 theorems, 38 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Problem statement
Sample-based stability analysis
Finite-sample stability analysis
Reinforcement learning with probabilistic stability guarantee
Policy gradient
Lyapunov function
Simulation results
Conclusion and discussion

Key Result

Lemma 1

han2020actor Assume that Assumptions assumption: stationary distribution assumption-initial state assumption hold for system intro:system. The stochastic system (intro:system) is mean square stable if there exists a function $L:\mathcal{S}\rightarrow \mathbb{R}_{+}$ and positive constants $\alpha _{ where is the infinite sampling distribution.

Figures (2)

Figure 1: State trajectories of the controllers trained by L-REINFORCE and REINFORCE. The X-axis denotes the time steps; and the Y-axis denotes the position $x$ in meters and the angle $\theta$ in radians, respectively. Zoom-in views are displayed inside the plots.
Figure 2: Visualization of the probabilistic stability bound. The X-axis indicates the length of trajectories $T$ and Y-axis indicates the number of episodes $M$. The Z-axis indicates the probability of stability and the values are colored differently according to the color bar.

Theorems & Definitions (9)

Definition 1
Remark 1
Lemma 1
Remark 2
Lemma 2
Lemma 3
Theorem 1
Theorem 2
Remark 3

Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

TL;DR

Abstract

Reinforcement Learning for Control with Probabilistic Stability Guarantee: A Finite-Sample Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)