Online Linear Quadratic Tracking with Regret Guarantees

Aren Karapetyan; Diego Bolliger; Anastasios Tsiamis; Efe C. Balta; John Lygeros

Online Linear Quadratic Tracking with Regret Guarantees

Aren Karapetyan, Diego Bolliger, Anastasios Tsiamis, Efe C. Balta, John Lygeros

TL;DR

This work poses the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input and proposes a novel online gradient descent-based algorithm to achieve efficient tracking in finite time.

Abstract

Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input. We show the equivalence of this problem to the control of linear systems subject to adversarial disturbances and propose a novel online gradient descent based algorithm to achieve efficient tracking in finite time. We provide a dynamic regret upper bound scaling linearly with the path length of the reference trajectory and a numerical example to corroborate the theoretical guarantees.

Online Linear Quadratic Tracking with Regret Guarantees

TL;DR

Abstract

Paper Structure (10 sections, 6 theorems, 43 equations, 2 figures)

This paper contains 10 sections, 6 theorems, 43 equations, 2 figures.

Introduction
Problem Statement
The SS-OGD Algorithm
Regret Analysis
Proof of Theorem \ref{['thm:optimal_offline_regret']}
Steady State Benchmark
Numerical Example
Conclusion
System-Optimizer Dynamics
Proof of Theorem \ref{['thm:ss_ogd_convergence']}

Key Result

Lemma III.1

Under Assumption ass:standard, eq:steady_state_program is strictly convex in $\bar{v}$ for any $K\in \mathbb{R}^{m\times n}$, for which $\rho(A-BK)<1$.

Figures (2)

Figure 1: Tracking a 2-D shape with a quadrotor model. The horizontal position plot (left panel) shows the apparent better tracking of the CE controller. However, the time plot (top right panel) shows its visible time lag; by contrast SS-OGD quickly converges to the reference. This leads to a lower rate of regret for SS-OGD (bottom right panel).
Figure 2: Empirical regret of SS-OGD with a finite reference path length converges to a finite value, as expected from the theoretical bound.

Theorems & Definitions (10)

Definition II.1: Path Length
Lemma III.1
proof
Theorem III.2
Theorem IV.1
Lemma IV.2
Lemma IV.3
proof
Lemma IV.4
proof

Online Linear Quadratic Tracking with Regret Guarantees

TL;DR

Abstract

Online Linear Quadratic Tracking with Regret Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (10)