Table of Contents
Fetching ...

Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR

Peter A. Fisher, Anuradha M. Annaswamy

TL;DR

The paper tackles practical adaptive LQR by proposing MRAC-LQR, a framework that uses a fast direct MRAC-based stabilization loop inside a slower learning loop that applies epoch-based parameter updates. It removes the need for an initial stabilizing controller and for persistent excitation to guarantee stability, while still providing a high-probability regret bound akin to existing methods. Theoretical results show stability with probability one and a regret bound of tilde O(T^{2/3}) under exploration, with simulations demonstrating competitive performance relative to state-of-the-art methods and clear advantages when initial stabilization or strong excitation is unavailable. The work also outlines future directions, including extensions to time-varying dynamics and broader adaptive control settings, underscoring the practical impact for real-world adaptive control systems.

Abstract

This paper focuses on adaptive control of the discrete-time linear quadratic regulator (adaptive LQR). Recent literature has made significant contributions in proving non-asymptotic convergence rates, but existing approaches have a few drawbacks that pose barriers for practical implementation. These drawbacks include (i) a requirement of an initial stabilizing controller, (ii) a reliance on exploration for closed-loop stability, and/or (iii) computationally intensive algorithms. This paper proposes a new algorithm that overcomes these drawbacks for a particular class of discrete-time systems. This algorithm leverages direct Model-Reference Adaptive Control (direct MRAC) and combines it with an epoch-based approach in order to address the drawbacks (i)-(iii) with a provable high-probability regret bound comparable to existing literature. Simulations demonstrate that the proposed approach yields regrets that are comparable to those from existing methods when the conditions (i) and (ii) are met, and yields regrets that are significantly smaller when either of these two conditions is not met.

Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR

TL;DR

The paper tackles practical adaptive LQR by proposing MRAC-LQR, a framework that uses a fast direct MRAC-based stabilization loop inside a slower learning loop that applies epoch-based parameter updates. It removes the need for an initial stabilizing controller and for persistent excitation to guarantee stability, while still providing a high-probability regret bound akin to existing methods. Theoretical results show stability with probability one and a regret bound of tilde O(T^{2/3}) under exploration, with simulations demonstrating competitive performance relative to state-of-the-art methods and clear advantages when initial stabilization or strong excitation is unavailable. The work also outlines future directions, including extensions to time-varying dynamics and broader adaptive control settings, underscoring the practical impact for real-world adaptive control systems.

Abstract

This paper focuses on adaptive control of the discrete-time linear quadratic regulator (adaptive LQR). Recent literature has made significant contributions in proving non-asymptotic convergence rates, but existing approaches have a few drawbacks that pose barriers for practical implementation. These drawbacks include (i) a requirement of an initial stabilizing controller, (ii) a reliance on exploration for closed-loop stability, and/or (iii) computationally intensive algorithms. This paper proposes a new algorithm that overcomes these drawbacks for a particular class of discrete-time systems. This algorithm leverages direct Model-Reference Adaptive Control (direct MRAC) and combines it with an epoch-based approach in order to address the drawbacks (i)-(iii) with a provable high-probability regret bound comparable to existing literature. Simulations demonstrate that the proposed approach yields regrets that are comparable to those from existing methods when the conditions (i) and (ii) are met, and yields regrets that are significantly smaller when either of these two conditions is not met.

Paper Structure

This paper contains 34 sections, 16 theorems, 66 equations, 7 figures, 1 algorithm.

Key Result

Proposition 1

Consider a stable discrete-time LTI system given by $x_{t+1} = Ax_t + Bu_t + w_{t+1}$, $x_t, w_t \in \mathbb{R}^n$, $u_t \in \mathbb{R}^m$, with arbitrary initial conditions and $w_t \sim \mathrm{subG}(\sigma_w^2I_n)$ i.i.d. Suppose that the input is chosen as $u_t = Kx_t + r_t$ such that $A_K := A

Figures (7)

  • Figure 1: Laplacian system with unstable initial controller: $\sigma_{\rm explore} = 0.1$. Solid lines are the median values over 1000 trials, and shaded regions are the 20%-80% confidence windows.
  • Figure 2: Laplacian system with stable initial controller: $\sigma_{\rm explore} = 0.1$. Solid lines are the median values over 1000 trials, and shaded regions are the 20%-80% confidence windows.
  • Figure 3: Laplacian system with stable initial controller: $\sigma_{\rm explore} = 0.01$. Solid lines are the median values over 1000 trials, and shaded regions are the 20%-80% confidence windows.
  • Figure 4: 6DOF quadrotor: $\sigma_{\rm explore} = 0.01$. Solid lines are the median values over 1000 trials, and shaded regions are the 20%-80% confidence windows.
  • Figure 5: Laplacian system with stable initial controller: $\sigma_{\rm explore} = 0.1$. Solid lines are the median values over 1000 trials, and shaded regions are the 20%-80% confidence windows.
  • ...and 2 more figures

Theorems & Definitions (24)

  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 1: Vershynin2019
  • Definition 2: Pisier2016
  • Definition 3: sarker2023accurate
  • Definition 4: sarker2023accurate
  • Proposition 1: Adapted from sarker2023accurate
  • Proposition 2: Adapted from sarker2023accurate
  • Proposition 3: Adapted from guo1996WLSAdaptiveControl
  • ...and 14 more