Table of Contents
Fetching ...

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

Jing Guo Jing Guo, Xiushan Jiang, Weihai Zhang

TL;DR

An off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated.

Abstract

The stochastic $H_{\infty}$ control is studied for a linear stochastic Itô system with an unknown system model. The linear stochastic $H_{\infty}$ control issue is known to be transformable into the problem of solving a so-called generalized algebraic Riccati equation (GARE), which is a nonlinear equation that is typically difficult to solve analytically. Worse, model-based techniques cannot be utilized to approximately solve a GARE when an accurate system model is unavailable or prohibitively expensive to construct in reality. To address these issues, an off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated. In the off-policy RL approach, the system data may be created with behavior policies rather than the target policies, which is highly significant and promising for use in actual systems. Finally, the proposed off-policy RL approach is validated on a stochastic linear F-16 aircraft system.

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

TL;DR

An off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated.

Abstract

The stochastic control is studied for a linear stochastic Itô system with an unknown system model. The linear stochastic control issue is known to be transformable into the problem of solving a so-called generalized algebraic Riccati equation (GARE), which is a nonlinear equation that is typically difficult to solve analytically. Worse, model-based techniques cannot be utilized to approximately solve a GARE when an accurate system model is unavailable or prohibitively expensive to construct in reality. To address these issues, an off-policy reinforcement learning (RL) approach is presented to learn the solution of a GARE from real system data rather than a system model; its convergence is demonstrated, and the robustness of RL to errors in the learning process is investigated. In the off-policy RL approach, the system data may be created with behavior policies rather than the target policies, which is highly significant and promising for use in actual systems. Finally, the proposed off-policy RL approach is validated on a stochastic linear F-16 aircraft system.
Paper Structure (13 sections, 13 theorems, 65 equations, 2 figures, 2 algorithms)

This paper contains 13 sections, 13 theorems, 65 equations, 2 figures, 2 algorithms.

Key Result

Lemma 1

zhang2004stabilizability The system is asymptotically mean square stable if and only if $\sigma\left( \mathscr{L}_{A,A_{1}}\right) \subset \mathcal{C}_{-}$, where the generalized Lyapunov operator $\mathscr{L}_{A,A_{1}}$ is defined by and the spectral set of $\mathscr{L}_{A,A_{1}}$ is given by

Figures (2)

  • Figure 1: State trajectories of the closed-loop F-16 aircraft system.
  • Figure 2: The norms obtained by Algorithm \ref{['alg:1']} and Algorithm \ref{['alg:3']}.

Theorems & Definitions (30)

  • Lemma 1
  • Lemma 2
  • Proposition 1
  • Lemma 3
  • Definition 1
  • Remark 1
  • Lemma 4
  • proof
  • Lemma 5
  • Theorem 1
  • ...and 20 more