MR-ARL: Model Reference Adaptive Reinforcement Learning for Robustly Stable On-Policy Data-Driven LQR
Marco Borghesi, Alessandro Bosso, Giuseppe Notarstefano
TL;DR
The paper addresses robust, on-policy data-driven LQR for partially unknown linear systems by combining model reference adaptive control with reinforcement-learning-style value updates (MR-ARL). A Critic identifies $\hat{A}$ and computes $\hat{P}$ via a DRE-based ARE, while an adaptive Actor $u= -R^{-1}B^T\hat{P}x + \hat{K}_a x + d$ tracks a time-varying Reference Model $\dot{x}_m=(\hat{A}-BR^{-1}B^T\hat{P})x_m + Bd$, ensuring convergence to the optimal policy $K^*=-R^{-1}B^T P^*$. The authors prove semiglobal uniform asymptotic stability of the overall attractor, with exponential convergence to the optimum under persistency of excitation and appropriate tuning ($\gamma$ small, $g$ large), and they demonstrate robustness to measurement noise, nonlinearities, and slowly varying parameters through numerical DFIM examples. The framework yields formal robustness certificates for real-world deployments and does not require an initial stabilizing policy, making it suitable for safety-critical applications.
Abstract
This article introduces a novel framework for data-driven linear quadratic regulator (LQR) design. First, we introduce a reinforcement learning paradigm for on-policy data-driven LQR, where exploration and exploitation are simultaneously performed while guaranteeing robust stability of the whole closed-loop system encompassing the plant and the control/learning dynamics. Then, we propose Model Reference Adaptive Reinforcement Learning (MR-ARL), a control architecture integrating tools from reinforcement learning and model reference adaptive control. The approach stands on a variable reference model containing the currently identified value function. Then, an adaptive stabilizer is used to ensure convergence of the applied policy to the optimal one, convergence of the plant to the optimal reference model, and overall robust closed-loop stability. The proposed framework provides theoretical robustness certificates against real-world perturbations such as measurement noise, plant nonlinearities, or slowly varying parameters. The effectiveness of the proposed architecture is validated via realistic numerical simulations.
