Adaptive Online Non-stochastic Control

Naram Mhaisen; George Iosifidis

Adaptive Online Non-stochastic Control

Naram Mhaisen, George Iosifidis

TL;DR

This work addresses online non-stochastic control with adversarial disturbances and convex costs by developing AdaFTRL-C, an adaptive FTRL-based controller that leverages cost-proportional regularizers to achieve data-dependent policy regret. The core idea is to bound regret in terms of the observed cost gradients, yielding an adaptive bound of $O\left(\sqrt{\sum_{t=1}^T g_t}\right)$ where $g_t=\|G_t\|^2$, while handling the memory effects inherent in dynamical systems. A key technical contribution is bounding the state deviation between non-stationary and stationary policies to connect online losses to a stationary surrogate, enabling adaptive analysis for NSC. Empirical results on a two-dimensional LTI system demonstrate gains in easy environments and robust sublinear performance in worst-case scenarios, with practical guidance and code available for reproduction.

Abstract

We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.

Adaptive Online Non-stochastic Control

TL;DR

where

, while handling the memory effects inherent in dynamical systems. A key technical contribution is bounding the state deviation between non-stationary and stationary policies to connect online losses to a stationary surrogate, enabling adaptive analysis for NSC. Empirical results on a two-dimensional LTI system demonstrate gains in easy environments and robust sublinear performance in worst-case scenarios, with practical guidance and code available for reproduction.

Abstract

Paper Structure (15 sections, 8 theorems, 41 equations, 1 figure, 1 algorithm)

This paper contains 15 sections, 8 theorems, 41 equations, 1 figure, 1 algorithm.

Introduction
Background & Motivation
Methodology and Contributions
Notation
Related Work
Preliminaries
AdaFTRL-C
Numerical examples & Conclusion
Auxiliary Lemmas
Appendix
Proof of Lemma \ref{['lemma-dac-state']}
Proof of Lemma \ref{['lemma:class-approx']}
The non-adaptive case (OGD with fixed learning rate)
On the choice of the decision set $\mathcal{M}$
On the strongly stable controller $K$

Key Result

lemma 1

Assuming that $\bm{x}_{1} =0$, and parameters $M_t, \bm{w}_t$ are $0$ for $t \leq 0$, the state of the system reached at $t+1$ upon the execution of actions $\{\bm u_i\}_{i=1}^t$, derived from a DAC policy $\pi_t$, is:

Figures (1)

Figure 1: Average regret when $(a)$: $\bm{\theta}_t\!=\!(1,1), \bm{w}_t\!=\!-(0.1,0.1)\forall t$; $(b)$: $\bm{\theta}_t\!=\!(10,10), \bm{w}_t\!=\!-(1,1) \forall t$; $(c)$: $\bm{\theta}_t\!=\!(10,10)$ or $-(10,10)$ (Alternating every $1000$ steps), $\bm{w}_t\!=\!-(1,1) \forall t$.

Theorems & Definitions (16)

lemma 1
lemma 2
theorem 1
lemma 3
proof
proof
lemma 4
proof
lemma 5
proof
...and 6 more

Adaptive Online Non-stochastic Control

TL;DR

Abstract

Adaptive Online Non-stochastic Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (16)