Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

Ehsan Badfar; Babak Tavassoli

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

Ehsan Badfar, Babak Tavassoli

TL;DR

This work addresses model-free optimal control for discrete-time MJLS where system dynamics are unknown. ItDevelops a mode-aware Q-learning framework that defines a quadratic Q-function with a kernel matrix $H_{\theta}$ and uses least-squares estimation to learn $H_i$ from input-state data, enabling policy evaluation and improvement toward optimal gains. Theoretical results show that the learned gains $K_i^j$ converge to the model-based optimal gains $K_i = (R_i+B_i^T\mathcal{E}_i(P)B_i)^{-1} B_i^T\mathcal{E}_i(P)A_i$ and that the corresponding cost matrices $P_i^j$ converge to the CARE solutions $P_i$, while excitation noise does not bias the estimation. Simulations on a two-mode MJLS demonstrate rapid convergence of the model-free controller to the model-based controller within about 25 iterations, achieving mean-square stability without requiring prior system knowledge.

Abstract

This research paper introduces a model-free optimal controller for discrete-time Markovian jump linear systems (MJLSs), employing principles from the methodology of reinforcement learning (RL). While Q-learning methods have demonstrated efficacy in determining optimal controller gains for deterministic systems, their application to systems with Markovian switching remains unexplored. To address this research gap, we propose a Q-function involving the Markovian mode. Subsequently, a Q-learning algorithm is proposed to learn the unknown kernel matrix using raw input-state information from the system. Notably, the study proves the convergence of the proposed Q-learning optimal controller gains to the model-based optimal controller gains after proving the convergence of a value iteration algorithm as the first step. Addition of excitation noise to input which is required to ensure the leaning performance does not lead to any bias. Unlike the conventional optimal controller, the proposed method does not require any knowledge on system dynamics and eliminates the need for solving coupled algebraic Riccati equations arising in optimal control of MJLSs. Finally, the efficiency of the proposed method is demonstrated through a simulation study.

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

TL;DR

and uses least-squares estimation to learn

from input-state data, enabling policy evaluation and improvement toward optimal gains. Theoretical results show that the learned gains

converge to the model-based optimal gains

and that the corresponding cost matrices

converge to the CARE solutions

, while excitation noise does not bias the estimation. Simulations on a two-mode MJLS demonstrate rapid convergence of the model-free controller to the model-based controller within about 25 iterations, achieving mean-square stability without requiring prior system knowledge.

Abstract

Paper Structure (15 sections, 7 theorems, 76 equations, 3 figures, 1 algorithm)

This paper contains 15 sections, 7 theorems, 76 equations, 3 figures, 1 algorithm.

Introduction
Background and problem statement
System description
Model-based Optimal Controller
Problem statement
Value iteration for MJLS
Value iteration algorithm
Convergence proof of value iteration algorithm
Q-learning
Q-function for optimal control
Formulation of Q-learning
Implementation of Algorithm
Convergence of algorithm
Simulation analysis
Conclusion

Key Result

Theorem 1

If the system $\mathcal{G}$ in eq:sys_dynamics satisfies the assumptions assum2, assum3, and assum4, then, the optimal value of the cost function eq:cost is achieved by applying the control policy given by with the control gains for $i\in\Theta$ defined as where the $N$-matrix $P=(P_1,\ldots,P_N)\in\mathbb{R}^{n\times n\times N}$ has positive definite elements that satisfy for $i\in\Theta$ with

Figures (3)

Figure 1: Transitions of the Markovian mode
Figure 2: Model-free controller during iterations.
Figure 3: Convergence of Controller gains during learning.

Theorems & Definitions (20)

Definition 1
Definition 2: zhao2008practical
Theorem 1: costa2006discrete
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Remark 1
...and 10 more

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

TL;DR

Abstract

Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)