Table of Contents
Fetching ...

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

Marco Borghesi, Alessandro Bosso, Giuseppe Notarstefano

TL;DR

The paper addresses robust, on-policy data-driven LQR for partially unknown linear systems by combining model reference adaptive control with reinforcement-learning-style value updates (MR-ARL). A Critic identifies $\hat{A}$ and computes $\hat{P}$ via a DRE-based ARE, while an adaptive Actor $u= -R^{-1}B^T\hat{P}x + \hat{K}_a x + d$ tracks a time-varying Reference Model $\dot{x}_m=(\hat{A}-BR^{-1}B^T\hat{P})x_m + Bd$, ensuring convergence to the optimal policy $K^*=-R^{-1}B^T P^*$. The authors prove semiglobal uniform asymptotic stability of the overall attractor, with exponential convergence to the optimum under persistency of excitation and appropriate tuning ($\gamma$ small, $g$ large), and they demonstrate robustness to measurement noise, nonlinearities, and slowly varying parameters through numerical DFIM examples. The framework yields formal robustness certificates for real-world deployments and does not require an initial stabilizing policy, making it suitable for safety-critical applications.

Abstract

This article introduces a novel framework for data-driven linear quadratic regulator (LQR) design. First, we introduce a reinforcement learning paradigm for on-policy data-driven LQR, where exploration and exploitation are simultaneously performed while guaranteeing robust stability of the whole closed-loop system encompassing the plant and the control/learning dynamics. Then, we propose Model Reference Adaptive Reinforcement Learning (MR-ARL), a control architecture integrating tools from reinforcement learning and model reference adaptive control. The approach stands on a variable reference model containing the currently identified value function. Then, an adaptive stabilizer is used to ensure convergence of the applied policy to the optimal one, convergence of the plant to the optimal reference model, and overall robust closed-loop stability. The proposed framework provides theoretical robustness certificates against real-world perturbations such as measurement noise, plant nonlinearities, or slowly varying parameters. The effectiveness of the proposed architecture is validated via realistic numerical simulations.

MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR

TL;DR

The paper addresses robust, on-policy data-driven LQR for partially unknown linear systems by combining model reference adaptive control with reinforcement-learning-style value updates (MR-ARL). A Critic identifies and computes via a DRE-based ARE, while an adaptive Actor tracks a time-varying Reference Model , ensuring convergence to the optimal policy . The authors prove semiglobal uniform asymptotic stability of the overall attractor, with exponential convergence to the optimum under persistency of excitation and appropriate tuning ( small, large), and they demonstrate robustness to measurement noise, nonlinearities, and slowly varying parameters through numerical DFIM examples. The framework yields formal robustness certificates for real-world deployments and does not require an initial stabilizing policy, making it suitable for safety-critical applications.

Abstract

This article introduces a novel framework for data-driven linear quadratic regulator (LQR) design. First, we introduce a reinforcement learning paradigm for on-policy data-driven LQR, where exploration and exploitation are simultaneously performed while guaranteeing robust stability of the whole closed-loop system encompassing the plant and the control/learning dynamics. Then, we propose Model Reference Adaptive Reinforcement Learning (MR-ARL), a control architecture integrating tools from reinforcement learning and model reference adaptive control. The approach stands on a variable reference model containing the currently identified value function. Then, an adaptive stabilizer is used to ensure convergence of the applied policy to the optimal one, convergence of the plant to the optimal reference model, and overall robust closed-loop stability. The proposed framework provides theoretical robustness certificates against real-world perturbations such as measurement noise, plant nonlinearities, or slowly varying parameters. The effectiveness of the proposed architecture is validated via realistic numerical simulations.
Paper Structure (27 sections, 10 theorems, 91 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 10 theorems, 91 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Consider the closed-loop system given by the interconnection of plant eq:plant_dynamics and the controller of Algorithm alg:MRARL, with ${\hat{P}}(t)=\mathcal{P}({\hat{A}}(t))$ for all $t$ and ${\mathcal{P}({\hat{A}})}$ satisfying eq:ARE_static. Let the stationary dither $d$ be generated by an exosy that is uniformly globally asymptotically stable.

Figures (5)

  • Figure 1: Block scheme of the Model Reference Adaptive Reinforcement Learning.
  • Figure 2: Convergence to true $A$ and to optimal gain $K^\star$.
  • Figure 3: Tracking error between plant and reference model. Different colors stand for different components of $e$.
  • Figure 4: Convergence to true $A(t)$ and to optimal gain $K^\star(t)$.
  • Figure 5: Tracking error between plant and reference model. Different colors stand for different components of $e$.

Theorems & Definitions (32)

  • Remark 1
  • Definition 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6
  • Remark 7
  • Remark 8
  • Remark 9
  • ...and 22 more