Revisiting Regret Benchmarks in Online Non-Stochastic Control
Vijeth Hebbar, Cédric Langbort
TL;DR
This work addresses online non-stochastic control with adversarial convex costs and disturbances by introducing a new, more meaningful regret benchmark based on the best fixed input and an online control algorithm that achieves sublinear regret with respect to this benchmark. The method reduces the disturbed problem to a nominal disturbance-free formulation, enabling sublinear regret bounds in both disturbance-free and disturbed settings, and it clarifies connections to Agarwal et al.'s early framework with a comparison of Benchmarks and policy classes. Empirical simulations show the fixed-input benchmark can be more informative than the DAC benchmark in certain scenarios and that the proposed algorithm outperforms prior approaches against the fixed-input metric. Overall, the paper provides a principled regret framework for online control with general convex costs and establishes guarantees alongside practical insights for benchmark selection and algorithm design.
Abstract
In the online non-stochastic control problem, an agent sequentially selects control inputs for a linear dynamical system when facing unknown and adversarially selected convex costs and disturbances. A common metric for evaluating control policies in this setting is policy regret, defined relative to the best-in-hindsight linear feedback controller. However, for general convex costs, this benchmark may be less meaningful since linear controllers can be highly suboptimal. To address this, we introduce an alternative, more suitable benchmark--the performance of the best fixed input. We show that this benchmark can be viewed as a natural extension of the standard benchmark used in online convex optimization and propose a novel online control algorithm that achieves sublinear regret with respect to this new benchmark. We also discuss the connections between our method and the original one proposed by Agarwal et al. in their seminal work introducing the online non-stochastic control problem, and compare the performance of both approaches through numerical simulations.
