Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

Dongdong Li; Jiuxiang Dong

Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

Dongdong Li, Jiuxiang Dong

TL;DR

This work addresses stabilizing optimal control for discrete-time linear systems with unknown dynamics by introducing two model-free, off-policy algorithms that rely on damping-coefficient based homotopy. The first algorithm extends policy iteration to a data-driven setting, while the second employs off-policy Q-learning to estimate both stabilizing gains and ARE-like solutions, without applying current gains to the plant. Both methods provide explicit damping-update rules and persistently excited data conditions to guarantee convergence to the optimal gain and ARE solution, with simulations validating rapid convergence on an unstable DT system. The approach reduces reliance on accurate system models and enables efficient, data-driven stabilization and near-optimal control in practice, with clear paths for extension to broader DT systems and real-world scenarios.

Abstract

Policy iteration is one of the classical frameworks of reinforcement learning, which requires a known initial stabilizing control. However, finding the initial stabilizing control depends on the known system model. To relax this requirement and achieve model-free optimal control, in this paper, two different reinforcement learning algorithms based on policy iteration and variable damping coefficients are designed for unknown discrete-time linear systems. First, a stable artificial system is designed, and this system is gradually iterated to the original system by varying the damping coefficients. This allows the initial stabilizing control to be obtained in a finite number of iteration steps. Then, an off-policy iteration algorithm and an off-policy $\mathcal{Q}$-learning algorithm are designed to select the appropriate damping coefficients and realize data-driven. In these two algorithms, the current estimates of optimal control gain are not applied to the system to re-collect data. Moreover, they are characterized by the fast convergence of the traditional policy iteration. Finally, the proposed algorithms are validated by simulation.

Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

TL;DR

Abstract

-learning algorithm are designed to select the appropriate damping coefficients and realize data-driven. In these two algorithms, the current estimates of optimal control gain are not applied to the system to re-collect data. Moreover, they are characterized by the fast convergence of the traditional policy iteration. Finally, the proposed algorithms are validated by simulation.

Paper Structure (25 sections, 11 theorems, 95 equations, 9 figures, 2 algorithms)

This paper contains 25 sections, 11 theorems, 95 equations, 9 figures, 2 algorithms.

Introduction
Background
Motivation
Related Work
Contribution
Organization
Notation
Formulation and Preliminaries
System and Problem Description
Discrete-time Policy Iteration
Model-based Policy Iteration for Stabilizing Control
Model-free Policy Iteration to Slove Discrete-time ARE
Collect Data and Determine $\tilde{\beta}$ Without A Priori Knowledge
Calculate Stabilizing Gain $\tilde{K}^{j+1}$ and Select Damping Coefficient $\alpha_{j+1}$
The Overall Stabilizing Off-Policy Iteration Algorithm
...and 10 more sections

Key Result

Lemma 1

hewer1971iterativechen2022robust Given any initial stabilizing control gain $K^{0}$ satisfying $\rho(A-BK^{0})<1$. For $i=0,1,2,\dots$, solve $P^{i}=(P^{i})^{T}$ by the Lyapunov equation Update the gain by Then, 1) $\rho(A-BK^{i+1})<1$; 2) $P^*\leq P^{i+1}\leq P^{i}$; 3) $\lim_{i\rightarrow\infty}P^{i}=P^*$, $\lim_{i\rightarrow\infty}K^{i}=K^*$.

Figures (9)

Figure 1: The spectral radius of the open-loop artificial system $(\tilde{\beta}+\sum_{m=0}^{j}\alpha_{m})A$ is denoted as $\rho^{o}_{j}=\rho[(\tilde{\beta}+\sum_{m=0}^{j}\alpha_{m})A]$; The spectral radius of the closed-loop system $A-B\tilde{K}^{j}$ is denoted as $\rho_{j}=\rho(A-B\tilde{K}^{j})$.
Figure 2: (a). The closed-loop spectral radius $\rho(A-B\tilde{K}^{j})$ obtained by using Algorithm \ref{['alg1']}; (b). iterations of the matrix $\tilde{K}^{j}$.
Figure 3: (a). The system states and input obtained by using Algorithm \ref{['alg1']}; (b). optimal errors of ${P}^{i}$ and ${K}^{i}$ obtained by using Algorithm \ref{['alg1']}.
Figure 4: (a). The damping coefficient $\alpha_{j}$ obtained by using Algorithm \ref{['alg1']}; (b). the damping coefficient $\alpha_{j}$ obtained by using Algorithm \ref{['alg2']}.
Figure 5: (a). The closed-loop spectral radius $\rho(A-B\tilde{K}^{j})$ obtained by using Algorithm \ref{['alg2']}; (b). iterations of the matrix $\tilde{K}^{j}$ obtained by using Algorithm \ref{['alg2']}.
...and 4 more figures

Theorems & Definitions (22)

Lemma 1
Remark 1
Theorem 1
Remark 2
Remark 3
Lemma 2
Remark 4
Theorem 2
Remark 5
Lemma 3
...and 12 more

Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

TL;DR

Abstract

Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (22)