Table of Contents
Fetching ...

Adaptive Online Non-stochastic Control

Naram Mhaisen, George Iosifidis

TL;DR

This work addresses online non-stochastic control with adversarial disturbances and convex costs by developing AdaFTRL-C, an adaptive FTRL-based controller that leverages cost-proportional regularizers to achieve data-dependent policy regret. The core idea is to bound regret in terms of the observed cost gradients, yielding an adaptive bound of $O\left(\sqrt{\sum_{t=1}^T g_t}\right)$ where $g_t=\|G_t\|^2$, while handling the memory effects inherent in dynamical systems. A key technical contribution is bounding the state deviation between non-stationary and stationary policies to connect online losses to a stationary surrogate, enabling adaptive analysis for NSC. Empirical results on a two-dimensional LTI system demonstrate gains in easy environments and robust sublinear performance in worst-case scenarios, with practical guidance and code available for reproduction.

Abstract

We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.

Adaptive Online Non-stochastic Control

TL;DR

This work addresses online non-stochastic control with adversarial disturbances and convex costs by developing AdaFTRL-C, an adaptive FTRL-based controller that leverages cost-proportional regularizers to achieve data-dependent policy regret. The core idea is to bound regret in terms of the observed cost gradients, yielding an adaptive bound of where , while handling the memory effects inherent in dynamical systems. A key technical contribution is bounding the state deviation between non-stationary and stationary policies to connect online losses to a stationary surrogate, enabling adaptive analysis for NSC. Empirical results on a two-dimensional LTI system demonstrate gains in easy environments and robust sublinear performance in worst-case scenarios, with practical guidance and code available for reproduction.

Abstract

We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.
Paper Structure (15 sections, 8 theorems, 41 equations, 1 figure, 1 algorithm)

This paper contains 15 sections, 8 theorems, 41 equations, 1 figure, 1 algorithm.

Key Result

lemma 1

Assuming that $\bm{x}_{1} =0$, and parameters $M_t, \bm{w}_t$ are $0$ for $t \leq 0$, the state of the system reached at $t+1$ upon the execution of actions $\{\bm u_i\}_{i=1}^t$, derived from a DAC policy $\pi_t$, is:

Figures (1)

  • Figure 1: Average regret when $(a)$: $\bm{\theta}_t\!=\!(1,1), \bm{w}_t\!=\!-(0.1,0.1)\forall t$; $(b)$: $\bm{\theta}_t\!=\!(10,10), \bm{w}_t\!=\!-(1,1) \forall t$; $(c)$: $\bm{\theta}_t\!=\!(10,10)$ or $-(10,10)$ (Alternating every $1000$ steps), $\bm{w}_t\!=\!-(1,1) \forall t$.

Theorems & Definitions (16)

  • lemma 1
  • lemma 2
  • theorem 1
  • lemma 3
  • proof
  • proof
  • lemma 4
  • proof
  • lemma 5
  • proof
  • ...and 6 more