Model-free optimal controller for discrete-time Markovian jump linear systems: A Q-learning approach
Ehsan Badfar, Babak Tavassoli
TL;DR
This work addresses model-free optimal control for discrete-time MJLS where system dynamics are unknown. ItDevelops a mode-aware Q-learning framework that defines a quadratic Q-function with a kernel matrix $H_{\theta}$ and uses least-squares estimation to learn $H_i$ from input-state data, enabling policy evaluation and improvement toward optimal gains. Theoretical results show that the learned gains $K_i^j$ converge to the model-based optimal gains $K_i = (R_i+B_i^T\mathcal{E}_i(P)B_i)^{-1} B_i^T\mathcal{E}_i(P)A_i$ and that the corresponding cost matrices $P_i^j$ converge to the CARE solutions $P_i$, while excitation noise does not bias the estimation. Simulations on a two-mode MJLS demonstrate rapid convergence of the model-free controller to the model-based controller within about 25 iterations, achieving mean-square stability without requiring prior system knowledge.
Abstract
This research paper introduces a model-free optimal controller for discrete-time Markovian jump linear systems (MJLSs), employing principles from the methodology of reinforcement learning (RL). While Q-learning methods have demonstrated efficacy in determining optimal controller gains for deterministic systems, their application to systems with Markovian switching remains unexplored. To address this research gap, we propose a Q-function involving the Markovian mode. Subsequently, a Q-learning algorithm is proposed to learn the unknown kernel matrix using raw input-state information from the system. Notably, the study proves the convergence of the proposed Q-learning optimal controller gains to the model-based optimal controller gains after proving the convergence of a value iteration algorithm as the first step. Addition of excitation noise to input which is required to ensure the leaning performance does not lead to any bias. Unlike the conventional optimal controller, the proposed method does not require any knowledge on system dynamics and eliminates the need for solving coupled algebraic Riccati equations arising in optimal control of MJLSs. Finally, the efficiency of the proposed method is demonstrated through a simulation study.
