Stability-Certified On-Policy Data-Driven LQR via Recursive Learning and Policy Gradient
Lorenzo Sforni, Guido Carnevale, Ivano Notarnicola, Giuseppe Notarstefano
TL;DR
This work addresses infinite-horizon LQR for unknown dynamics by integrating online system identification with policy-gradient optimization in an on-policy setting. The Relearn LQR algorithm simultaneously updates an estimate of the unknown pair $(A_ullet,B_ullet)$ via a Recursive Least Squares–like mechanism and refines the feedback gain $K$ through a gradient flow, while injecting a dithering signal to guarantee persistent excitation. A Lyapunov-based, averaging-theory analysis for two-time-scale systems yields stability guarantees for the entire closed-loop learning and control loop, establishing convergence to the optimal gain $K^ullet$ and the true system matrices under small step sizes. Numerical experiments on an aircraft model with static and drifting parameters validate both the convergence and robustness properties, highlighting practical viability for data-driven control with stability certificates.
Abstract
In this paper, we investigate a data-driven framework to solve Linear Quadratic Regulator (LQR) problems when the dynamics is unknown, with the additional challenge of providing stability certificates for the overall learning and control scheme. Specifically, in the proposed on-policy learning framework, the control input is applied to the actual (unknown) linear system while iteratively optimized. We propose a learning and control procedure, termed Relearn LQR, that combines a recursive least squares method with a direct policy search based on the gradient method. The resulting scheme is analyzed by modeling it as a feedback-interconnected nonlinear dynamical system. A Lyapunov-based approach, exploiting averaging and timescale separation theories for nonlinear systems, allows us to provide formal stability guarantees for the whole interconnected scheme. The effectiveness of the proposed strategy is corroborated by numerical simulations, where Relearn LQR is deployed on an aircraft control problem, with both static and drifting parameters.
