MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

Marco Borghesi; Alessandro Bosso; Giuseppe Notarstefano

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

Marco Borghesi, Alessandro Bosso, Giuseppe Notarstefano

TL;DR

The paper addresses robust, on-policy data-driven LQR for partially unknown linear systems by combining model reference adaptive control with reinforcement-learning-style value updates (MR-ARL). A Critic identifies $\hat{A}$ and computes $\hat{P}$ via a DRE-based ARE, while an adaptive Actor $u= -R^{-1}B^T\hat{P}x + \hat{K}_a x + d$ tracks a time-varying Reference Model $\dot{x}_m=(\hat{A}-BR^{-1}B^T\hat{P})x_m + Bd$, ensuring convergence to the optimal policy $K^*=-R^{-1}B^T P^*$. The authors prove semiglobal uniform asymptotic stability of the overall attractor, with exponential convergence to the optimum under persistency of excitation and appropriate tuning ($\gamma$ small, $g$ large), and they demonstrate robustness to measurement noise, nonlinearities, and slowly varying parameters through numerical DFIM examples. The framework yields formal robustness certificates for real-world deployments and does not require an initial stabilizing policy, making it suitable for safety-critical applications.

Abstract

This article introduces a novel framework for data-driven linear quadratic regulator (LQR) design. First, we introduce a reinforcement learning paradigm for on-policy data-driven LQR, where exploration and exploitation are simultaneously performed while guaranteeing robust stability of the whole closed-loop system encompassing the plant and the control/learning dynamics. Then, we propose Model Reference Adaptive Reinforcement Learning (MR-ARL), a control architecture integrating tools from reinforcement learning and model reference adaptive control. The approach stands on a variable reference model containing the currently identified value function. Then, an adaptive stabilizer is used to ensure convergence of the applied policy to the optimal one, convergence of the plant to the optimal reference model, and overall robust closed-loop stability. The proposed framework provides theoretical robustness certificates against real-world perturbations such as measurement noise, plant nonlinearities, or slowly varying parameters. The effectiveness of the proposed architecture is validated via realistic numerical simulations.

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

TL;DR

and computes

via a DRE-based ARE, while an adaptive Actor

tracks a time-varying Reference Model

, ensuring convergence to the optimal policy

. The authors prove semiglobal uniform asymptotic stability of the overall attractor, with exponential convergence to the optimum under persistency of excitation and appropriate tuning (

small,

large), and they demonstrate robustness to measurement noise, nonlinearities, and slowly varying parameters through numerical DFIM examples. The framework yields formal robustness certificates for real-world deployments and does not require an initial stabilizing policy, making it suitable for safety-critical applications.

Abstract

Paper Structure (27 sections, 10 theorems, 91 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 10 theorems, 91 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries and Problem Setup
Linear Quadratic Regulation
Robustly Stable On-Policy Data-Driven LQR
Model Reference Adaptive Reinforcement Learning
Critic: Value Function Identifier
Reference Model
Actor: Model Reference Adaptive Controller
Main Result
Stability Result for the Reduced-Order System
Stability Result for MR-ARL
Algorithm Analysis
Error Dynamics
Identifier Dynamics
Reference Model Dynamics
...and 12 more sections

Key Result

Theorem 1

Consider the closed-loop system given by the interconnection of plant eq:plant_dynamics and the controller of Algorithm alg:MRARL, with ${\hat{P}}(t)=\mathcal{P}({\hat{A}}(t))$ for all $t$ and ${\mathcal{P}({\hat{A}})}$ satisfying eq:ARE_static. Let the stationary dither $d$ be generated by an exosy that is uniformly globally asymptotically stable.

Figures (5)

Figure 1: Block scheme of the Model Reference Adaptive Reinforcement Learning.
Figure 2: Convergence to true $A$ and to optimal gain $K^\star$.
Figure 3: Tracking error between plant and reference model. Different colors stand for different components of $e$.
Figure 4: Convergence to true $A(t)$ and to optimal gain $K^\star(t)$.
Figure 5: Tracking error between plant and reference model. Different colors stand for different components of $e$.

Theorems & Definitions (32)

Remark 1
Definition 1
Remark 2
Remark 3
Remark 4
Remark 5
Remark 6
Remark 7
Remark 8
Remark 9
...and 22 more

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

TL;DR

Abstract

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (32)