Regret Analysis: a control perspective

Travis E. Gibson; Sawal Acharya

Regret Analysis: a control perspective

Travis E. Gibson, Sawal Acharya

TL;DR

The paper analyzes the interface between regret-based online learning and model-reference/adaptive control by formalizing how each field assesses performance. It shows that online convex optimization relies on sublinear regret with vanishing or adaptive learning rates, while adaptive control emphasizes boundedness of parameters and convergence of tracking error, often via Lyapunov-based proofs and control-specific learning-rate scalings. Through online regression, the authors demonstrate how control-theoretic stability tools yield strong convergence results under a feature-scaled learning rate $\eta_t=\alpha_t/(m+x_t^T x_t)$, and discuss robustness with $\sigma$-modification. The discussion extends to Online Adaptive Control (OAC), where exploration signals must decay for regret optimality and where CE-based estimation maps parameter updates to feedback gains via the DARE. Overall, the note clarifies the conceptual and technical gaps between the two paradigms and outlines a pathway to a unified online adaptive control framework with provable guarantees.

Abstract

Online learning and model reference adaptive control have many interesting intersections. One area where they differ however is in how the algorithms are analyzed and what objective or metric is used to discriminate "good" algorithms from "bad" algorithms. In adaptive control there are usually two objectives: 1) prove that all time varying parameters/states of the system are bounded, and 2) that the instantaneous error between the adaptively controlled system and a reference system converges to zero over time (or at least a compact set). For online learning the performance of algorithms is often characterized by the regret the algorithm incurs. Regret is defined as the cumulative loss (cost) over time from the online algorithm minus the cumulative loss (cost) of the single optimal fixed parameter choice in hindsight. Another significant difference between the two areas of research is with regard to the assumptions made in order to obtain said results. Adaptive control makes assumptions about the input-output properties of the control problem and derives solutions for a fixed error model or optimization task. In the online learning literature results are derived for classes of loss functions (i.e. convex) while a priori assuming certain signals are bounded. In this work we discuss these differences in detail through the regret based analysis of gradient descent for convex functions and the control based analysis of a streaming regression problem. We close with a discussion about the newly defined paradigm of online adaptive control.

Regret Analysis: a control perspective

TL;DR

, and discuss robustness with

-modification. The discussion extends to Online Adaptive Control (OAC), where exploration signals must decay for regret optimality and where CE-based estimation maps parameter updates to feedback gains via the DARE. Overall, the note clarifies the conceptual and technical gaps between the two paradigms and outlines a pathway to a unified online adaptive control framework with provable guarantees.

Abstract

Paper Structure (5 sections, 4 theorems, 23 equations)

This paper contains 5 sections, 4 theorems, 23 equations.

Introduction
The basics of regret analysis for online convex optimization
Online regression using analysis from the control community
Contrasting the two approaches and the new paradigm of Online Adaptive Control
Closing

Key Result

Theorem 1

Let $\Theta$ be a bounded convex set with diameter D, and $\{ \ell_t\}$ be a sequence of convex functions such that $\left\lVert\nabla \ell_t\right\rVert \leq G$. Then with learning rate $\eta_t = \tfrac{D}{G\sqrt{t}}$ the regret of the algorithm in Equation g:2 can be bounded as follows $\mathtt{Re

Theorems & Definitions (9)

Definition 1
proof
Theorem 1
proof
Corollary 1
Theorem 2
proof
Theorem 3
proof

Regret Analysis: a control perspective

TL;DR

Abstract

Regret Analysis: a control perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (9)