Regret Optimal Control for Uncertain Stochastic Systems
Andrea Martin, Luca Furieri, Florian Dörfler, John Lygeros, Giancarlo Ferrari-Trecate
TL;DR
The paper tackles robust control of uncertain discrete-time linear time-varying systems by minimizing regret against a clairvoyant benchmark that knows disturbances and dynamics. It introduces a scenario optimization framework that samples uncertain parameters and formulates a convex SDP to synthesize a disturbance-feedback policy with probabilistic guarantees on out-of-sample regret and safety. A key contribution is the rigorous generalization bound: with N scenarios, the designed controller achieves a bounded regret for all but an epsilon-fraction of unseen dynamics with probability at least 1 - beta. Compared to worst-case H-infinity methods, the regret-based design often yields tighter performance certificates and reduced conservatism, as supported by numerical experiments on a mass-spring-damper system. The work paves the way for robust, scenario-aware control in settings with uncertain models and exogenous disturbances, with clear avenues for scalability and extension to broader dynamics and horizons.
Abstract
We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of both the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the clairvoyant optimal policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To address this challenge, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.
