Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

Jing Guo Jing Guo; Xiushan Jiang; Weihai Zhang

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

Jing Guo Jing Guo, Xiushan Jiang, Weihai Zhang

TL;DR

An off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated.

Abstract

The stochastic $H_{\infty}$ control is studied for a linear stochastic Itô system with an unknown system model. The linear stochastic $H_{\infty}$ control issue is known to be transformable into the problem of solving a so-called generalized algebraic Riccati equation (GARE), which is a nonlinear equation that is typically difficult to solve analytically. Worse, model-based techniques cannot be utilized to approximately solve a GARE when an accurate system model is unavailable or prohibitively expensive to construct in reality. To address these issues, an off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated. In the off-policy RL approach, the system data may be created with behavior policies rather than the target policies, which is highly significant and promising for use in actual systems. Finally, the proposed off-policy RL approach is validated on a stochastic linear F-16 aircraft system.

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

TL;DR

Abstract

The stochastic

control is studied for a linear stochastic Itô system with an unknown system model. The linear stochastic

control issue is known to be transformable into the problem of solving a so-called generalized algebraic Riccati equation (GARE), which is a nonlinear equation that is typically difficult to solve analytically. Worse, model-based techniques cannot be utilized to approximately solve a GARE when an accurate system model is unavailable or prohibitively expensive to construct in reality. To address these issues, an off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated. In the off-policy RL approach, the system data may be created with behavior policies rather than the target policies, which is highly significant and promising for use in actual systems. Finally, the proposed off-policy RL approach is validated on a stochastic linear F-16 aircraft system.

Paper Structure (13 sections, 13 theorems, 65 equations, 2 figures, 2 algorithms)

This paper contains 13 sections, 13 theorems, 65 equations, 2 figures, 2 algorithms.

Introduction
Problem formulation and preliminaries
Problem formulation
Preliminaries on stochastic $H_{\infty}$ control
Model-based SPU algorithm
Model-based SPU algorithm
Convergence analysis
Model-free off-policy RL algorithm
Robustness analysis
Numerical simulation
Conclusions
Proof of Theorem \ref{['th3']}
Proof of Theorem \ref{['th6']}

Key Result

Lemma 1

zhang2004stabilizability The system is asymptotically mean square stable if and only if $\sigma\left( \mathscr{L}_{A,A_{1}}\right) \subset \mathcal{C}_{-}$, where the generalized Lyapunov operator $\mathscr{L}_{A,A_{1}}$ is defined by and the spectral set of $\mathscr{L}_{A,A_{1}}$ is given by

Figures (2)

Figure 1: State trajectories of the closed-loop F-16 aircraft system.
Figure 2: The norms obtained by Algorithm \ref{['alg:1']} and Algorithm \ref{['alg:3']}.

Theorems & Definitions (30)

Lemma 1
Lemma 2
Proposition 1
Lemma 3
Definition 1
Remark 1
Lemma 4
proof
Lemma 5
Theorem 1
...and 20 more

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

TL;DR

Abstract

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (30)