Table of Contents
Fetching ...

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

Ehsan Badfar, Babak Tavassoli

TL;DR

This work addresses model-free optimal control for discrete-time MJLS where system dynamics are unknown. ItDevelops a mode-aware Q-learning framework that defines a quadratic Q-function with a kernel matrix $H_{\theta}$ and uses least-squares estimation to learn $H_i$ from input-state data, enabling policy evaluation and improvement toward optimal gains. Theoretical results show that the learned gains $K_i^j$ converge to the model-based optimal gains $K_i = (R_i+B_i^T\mathcal{E}_i(P)B_i)^{-1} B_i^T\mathcal{E}_i(P)A_i$ and that the corresponding cost matrices $P_i^j$ converge to the CARE solutions $P_i$, while excitation noise does not bias the estimation. Simulations on a two-mode MJLS demonstrate rapid convergence of the model-free controller to the model-based controller within about 25 iterations, achieving mean-square stability without requiring prior system knowledge.

Abstract

This research paper introduces a model-free optimal controller for discrete-time Markovian jump linear systems (MJLSs), employing principles from the methodology of reinforcement learning (RL). While Q-learning methods have demonstrated efficacy in determining optimal controller gains for deterministic systems, their application to systems with Markovian switching remains unexplored. To address this research gap, we propose a Q-function involving the Markovian mode. Subsequently, a Q-learning algorithm is proposed to learn the unknown kernel matrix using raw input-state information from the system. Notably, the study proves the convergence of the proposed Q-learning optimal controller gains to the model-based optimal controller gains after proving the convergence of a value iteration algorithm as the first step. Addition of excitation noise to input which is required to ensure the leaning performance does not lead to any bias. Unlike the conventional optimal controller, the proposed method does not require any knowledge on system dynamics and eliminates the need for solving coupled algebraic Riccati equations arising in optimal control of MJLSs. Finally, the efficiency of the proposed method is demonstrated through a simulation study.

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

TL;DR

This work addresses model-free optimal control for discrete-time MJLS where system dynamics are unknown. ItDevelops a mode-aware Q-learning framework that defines a quadratic Q-function with a kernel matrix and uses least-squares estimation to learn from input-state data, enabling policy evaluation and improvement toward optimal gains. Theoretical results show that the learned gains converge to the model-based optimal gains and that the corresponding cost matrices converge to the CARE solutions , while excitation noise does not bias the estimation. Simulations on a two-mode MJLS demonstrate rapid convergence of the model-free controller to the model-based controller within about 25 iterations, achieving mean-square stability without requiring prior system knowledge.

Abstract

This research paper introduces a model-free optimal controller for discrete-time Markovian jump linear systems (MJLSs), employing principles from the methodology of reinforcement learning (RL). While Q-learning methods have demonstrated efficacy in determining optimal controller gains for deterministic systems, their application to systems with Markovian switching remains unexplored. To address this research gap, we propose a Q-function involving the Markovian mode. Subsequently, a Q-learning algorithm is proposed to learn the unknown kernel matrix using raw input-state information from the system. Notably, the study proves the convergence of the proposed Q-learning optimal controller gains to the model-based optimal controller gains after proving the convergence of a value iteration algorithm as the first step. Addition of excitation noise to input which is required to ensure the leaning performance does not lead to any bias. Unlike the conventional optimal controller, the proposed method does not require any knowledge on system dynamics and eliminates the need for solving coupled algebraic Riccati equations arising in optimal control of MJLSs. Finally, the efficiency of the proposed method is demonstrated through a simulation study.
Paper Structure (15 sections, 7 theorems, 76 equations, 3 figures, 1 algorithm)

This paper contains 15 sections, 7 theorems, 76 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

If the system $\mathcal{G}$ in eq:sys_dynamics satisfies the assumptions assum2, assum3, and assum4, then, the optimal value of the cost function eq:cost is achieved by applying the control policy given by with the control gains for $i\in\Theta$ defined as where the $N$-matrix $P=(P_1,\ldots,P_N)\in\mathbb{R}^{n\times n\times N}$ has positive definite elements that satisfy for $i\in\Theta$ with

Figures (3)

  • Figure 1: Transitions of the Markovian mode
  • Figure 2: Model-free controller during iterations.
  • Figure 3: Convergence of Controller gains during learning.

Theorems & Definitions (20)

  • Definition 1
  • Definition 2: zhao2008practical
  • Theorem 1: costa2006discrete
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Remark 1
  • ...and 10 more