Table of Contents
Fetching ...

Any-Time Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

Jafar Abbaszadeh Chekan, Cedric Langbort

TL;DR

This work develops anytime regret guarantees for learning-based LQR control with unknown dynamics by embedding SDP-based policy design inside an optimism-in-the-face-of-uncertainty framework and injecting carefully scaled input perturbations. It introduces two algorithmic variants: ARSLO, which enforces strong sequential stability, and ARSLO^+(ar{ ho}), which relaxes this notion using a dwell-time inspired update rule to improve regret while preserving high-probability state bounds. A warm-up phase eliminates the need for a priori bounds on the DARE solution J_∗, and the analysis provides explicit state-norm bounds and system-theoretic regret guarantees that depend on the DARE solution P_∗ and system dimensions. Collectively, the paper advances convex, computationally efficient OFU-based LQR control with anytime guarantees, without requiring prior knowledge of J_∗, and clarifies the trade-offs between stability constraints and regret performance.

Abstract

We propose a computationally efficient algorithm that achieves anytime regret of order $\mathcal{O}(\sqrt{t})$, with explicit dependence on the system dimensions and on the solution of the Discrete Algebraic Riccati Equation (DARE). Our approach uses an appropriately tuned regularization and a sufficiently accurate initial estimate to construct confidence ellipsoids for control design. A carefully designed input-perturbation mechanism is incorporated to ensure anytime performance. We develop two variants of the algorithm. The first enforces strong sequential stability, requiring each policy to be stabilizing and successive policies to remain close. This sequential condition helps prevent state explosion at policy update times; however, it results in a suboptimal regret scaling with respect to the DARE solution. Motivated by this limitation, we introduce a second class of algorithms that removes this requirement and instead requires only that each generated policy be stabilizing. Closed-loop stability is then preserved through a dwell-time inspired policy-update rule. This class of algorithms also addresses key shortcomings of most existing approaches which lack explicit high-probability bounds on the state trajectory expressed in system-theoretic terms. Our analysis shows that partially relaxing the sequential-stability requirement yields optimal regret. Finally, our method eliminates the need for any \emph{a priori} bound on the norm of the DARE solution, an assumption required by all existing computationally efficient OFU based algorithms.

Any-Time Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

TL;DR

This work develops anytime regret guarantees for learning-based LQR control with unknown dynamics by embedding SDP-based policy design inside an optimism-in-the-face-of-uncertainty framework and injecting carefully scaled input perturbations. It introduces two algorithmic variants: ARSLO, which enforces strong sequential stability, and ARSLO^+(ar{ ho}), which relaxes this notion using a dwell-time inspired update rule to improve regret while preserving high-probability state bounds. A warm-up phase eliminates the need for a priori bounds on the DARE solution J_∗, and the analysis provides explicit state-norm bounds and system-theoretic regret guarantees that depend on the DARE solution P_∗ and system dimensions. Collectively, the paper advances convex, computationally efficient OFU-based LQR control with anytime guarantees, without requiring prior knowledge of J_∗, and clarifies the trade-offs between stability constraints and regret performance.

Abstract

We propose a computationally efficient algorithm that achieves anytime regret of order , with explicit dependence on the system dimensions and on the solution of the Discrete Algebraic Riccati Equation (DARE). Our approach uses an appropriately tuned regularization and a sufficiently accurate initial estimate to construct confidence ellipsoids for control design. A carefully designed input-perturbation mechanism is incorporated to ensure anytime performance. We develop two variants of the algorithm. The first enforces strong sequential stability, requiring each policy to be stabilizing and successive policies to remain close. This sequential condition helps prevent state explosion at policy update times; however, it results in a suboptimal regret scaling with respect to the DARE solution. Motivated by this limitation, we introduce a second class of algorithms that removes this requirement and instead requires only that each generated policy be stabilizing. Closed-loop stability is then preserved through a dwell-time inspired policy-update rule. This class of algorithms also addresses key shortcomings of most existing approaches which lack explicit high-probability bounds on the state trajectory expressed in system-theoretic terms. Our analysis shows that partially relaxing the sequential-stability requirement yields optimal regret. Finally, our method eliminates the need for any \emph{a priori} bound on the norm of the DARE solution, an assumption required by all existing computationally efficient OFU based algorithms.
Paper Structure (42 sections, 45 theorems, 417 equations, 1 table, 3 algorithms)

This paper contains 42 sections, 45 theorems, 417 equations, 1 table, 3 algorithms.

Key Result

Theorem 1

Consider Algorithm Alg:ACOLC, where the regularization parameter $\lambda$ is set as where and Let the additive perturbation noise applied to the feedback control designed via the relaxed primal SDP be where Provided with an initial estimate $\Theta_0$ satisfying the closed-loop system under the policies generated by Algorithm Alg:ACOLC is $(\kappa_*, \gamma_*)$-sequentially strongly stable,

Theorems & Definitions (50)

  • Definition 1
  • Theorem 1: Sequential Strong Stability of Algorithm \ref{['Alg:ACOLC']}
  • Proposition 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Remark 1
  • Corollary 2
  • Theorem 5
  • ...and 40 more