Table of Contents
Fetching ...

Introduction to Online Control

Elad Hazan, Karan Singh

TL;DR

This work introduces online nonstochastic control, reframing dynamical-control problems as online convex optimization under adversarial disturbances and losses. It develops a regret-based framework, where performance is measured against the best policy in hindsight from a benchmark class, and presents algorithms such as the Gradient Perturbation Controller (GPC) and Disturbance Action Controllers (DAC) with sublinear regret guarantees. The text systematically builds from classical control and MDPs to linear dynamical systems, online learning primitives, and system identification, culminating in online Kalman-style filtering and prediction under uncertainty. The approach yields finite-time guarantees and convex-optimization-based methods that are computationally tractable, enabling robust, adaptive control in adversarial and unknown-environment settings with practical impact for autonomous and networked systems.

Abstract

This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.

Introduction to Online Control

TL;DR

This work introduces online nonstochastic control, reframing dynamical-control problems as online convex optimization under adversarial disturbances and losses. It develops a regret-based framework, where performance is measured against the best policy in hindsight from a benchmark class, and presents algorithms such as the Gradient Perturbation Controller (GPC) and Disturbance Action Controllers (DAC) with sublinear regret guarantees. The text systematically builds from classical control and MDPs to linear dynamical systems, online learning primitives, and system identification, culminating in online Kalman-style filtering and prediction under uncertainty. The approach yields finite-time guarantees and convex-optimization-based methods that are computationally tractable, enabling robust, adaptive control in adversarial and unknown-environment settings with practical impact for autonomous and networked systems.

Abstract

This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.
Paper Structure (140 sections, 47 theorems, 269 equations, 22 figures, 5 algorithms)

This paper contains 140 sections, 47 theorems, 269 equations, 22 figures, 5 algorithms.

Key Result

Theorem 2.5

For a family of dynamical systems described as polynomials with integer coefficients, determining the stabilizability of any member is NP-hard.

Figures (22)

  • Figure 1: A centrifugal governor.
  • Figure 2: Performance of the PID controller on a mechanical ventilator, from suo2021machine.
  • Figure 3: A schematic of the respiratory circuit from suo2021machine.
  • Figure 4: Double integrator illustration showing state coordinate $x_1(t)$ (position), state coordinate $x_2(t)$ (velocity), mass $m$, and control input $u(t)$.
  • Figure 5: Pendulum swing-up illustration with marked angle $\theta_t$, gravitational force $mg$, control input torque $u_t$, mass $m$, and rod length $l$.
  • ...and 17 more figures

Theorems & Definitions (102)

  • Definition 1.1: Dynamical system
  • Definition 1.2: A generic control problem
  • Definition 1.3
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Theorem 2.5
  • proof
  • Definition 3.1
  • ...and 92 more