Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR
Peter A. Fisher, Anuradha M. Annaswamy
TL;DR
The paper tackles practical adaptive LQR by proposing MRAC-LQR, a framework that uses a fast direct MRAC-based stabilization loop inside a slower learning loop that applies epoch-based parameter updates. It removes the need for an initial stabilizing controller and for persistent excitation to guarantee stability, while still providing a high-probability regret bound akin to existing methods. Theoretical results show stability with probability one and a regret bound of tilde O(T^{2/3}) under exploration, with simulations demonstrating competitive performance relative to state-of-the-art methods and clear advantages when initial stabilization or strong excitation is unavailable. The work also outlines future directions, including extensions to time-varying dynamics and broader adaptive control settings, underscoring the practical impact for real-world adaptive control systems.
Abstract
This paper focuses on adaptive control of the discrete-time linear quadratic regulator (adaptive LQR). Recent literature has made significant contributions in proving non-asymptotic convergence rates, but existing approaches have a few drawbacks that pose barriers for practical implementation. These drawbacks include (i) a requirement of an initial stabilizing controller, (ii) a reliance on exploration for closed-loop stability, and/or (iii) computationally intensive algorithms. This paper proposes a new algorithm that overcomes these drawbacks for a particular class of discrete-time systems. This algorithm leverages direct Model-Reference Adaptive Control (direct MRAC) and combines it with an epoch-based approach in order to address the drawbacks (i)-(iii) with a provable high-probability regret bound comparable to existing literature. Simulations demonstrate that the proposed approach yields regrets that are comparable to those from existing methods when the conditions (i) and (ii) are met, and yields regrets that are significantly smaller when either of these two conditions is not met.
