Table of Contents
Fetching ...

Online Linear Quadratic Tracking with Regret Guarantees

Aren Karapetyan, Diego Bolliger, Anastasios Tsiamis, Efe C. Balta, John Lygeros

TL;DR

This work poses the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input and proposes a novel online gradient descent-based algorithm to achieve efficient tracking in finite time.

Abstract

Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input. We show the equivalence of this problem to the control of linear systems subject to adversarial disturbances and propose a novel online gradient descent based algorithm to achieve efficient tracking in finite time. We provide a dynamic regret upper bound scaling linearly with the path length of the reference trajectory and a numerical example to corroborate the theoretical guarantees.

Online Linear Quadratic Tracking with Regret Guarantees

TL;DR

This work poses the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input and proposes a novel online gradient descent-based algorithm to achieve efficient tracking in finite time.

Abstract

Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input. We show the equivalence of this problem to the control of linear systems subject to adversarial disturbances and propose a novel online gradient descent based algorithm to achieve efficient tracking in finite time. We provide a dynamic regret upper bound scaling linearly with the path length of the reference trajectory and a numerical example to corroborate the theoretical guarantees.
Paper Structure (10 sections, 6 theorems, 43 equations, 2 figures)

This paper contains 10 sections, 6 theorems, 43 equations, 2 figures.

Key Result

Lemma III.1

Under Assumption ass:standard, eq:steady_state_program is strictly convex in $\bar{v}$ for any $K\in \mathbb{R}^{m\times n}$, for which $\rho(A-BK)<1$.

Figures (2)

  • Figure 1: Tracking a 2-D shape with a quadrotor model. The horizontal position plot (left panel) shows the apparent better tracking of the CE controller. However, the time plot (top right panel) shows its visible time lag; by contrast SS-OGD quickly converges to the reference. This leads to a lower rate of regret for SS-OGD (bottom right panel).
  • Figure 2: Empirical regret of SS-OGD with a finite reference path length converges to a finite value, as expected from the theoretical bound.

Theorems & Definitions (10)

  • Definition II.1: Path Length
  • Lemma III.1
  • proof
  • Theorem III.2
  • Theorem IV.1
  • Lemma IV.2
  • Lemma IV.3
  • proof
  • Lemma IV.4
  • proof