Table of Contents
Fetching ...

Regret Optimal Control for Uncertain Stochastic Systems

Andrea Martin, Luca Furieri, Florian Dörfler, John Lygeros, Giancarlo Ferrari-Trecate

TL;DR

The paper tackles robust control of uncertain discrete-time linear time-varying systems by minimizing regret against a clairvoyant benchmark that knows disturbances and dynamics. It introduces a scenario optimization framework that samples uncertain parameters and formulates a convex SDP to synthesize a disturbance-feedback policy with probabilistic guarantees on out-of-sample regret and safety. A key contribution is the rigorous generalization bound: with N scenarios, the designed controller achieves a bounded regret for all but an epsilon-fraction of unseen dynamics with probability at least 1 - beta. Compared to worst-case H-infinity methods, the regret-based design often yields tighter performance certificates and reduced conservatism, as supported by numerical experiments on a mass-spring-damper system. The work paves the way for robust, scenario-aware control in settings with uncertain models and exogenous disturbances, with clear avenues for scalability and extension to broader dynamics and horizons.

Abstract

We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of both the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the clairvoyant optimal policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To address this challenge, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.

Regret Optimal Control for Uncertain Stochastic Systems

TL;DR

The paper tackles robust control of uncertain discrete-time linear time-varying systems by minimizing regret against a clairvoyant benchmark that knows disturbances and dynamics. It introduces a scenario optimization framework that samples uncertain parameters and formulates a convex SDP to synthesize a disturbance-feedback policy with probabilistic guarantees on out-of-sample regret and safety. A key contribution is the rigorous generalization bound: with N scenarios, the designed controller achieves a bounded regret for all but an epsilon-fraction of unseen dynamics with probability at least 1 - beta. Compared to worst-case H-infinity methods, the regret-based design often yields tighter performance certificates and reduced conservatism, as supported by numerical experiments on a mass-spring-damper system. The work paves the way for robust, scenario-aware control in settings with uncertain models and exogenous disturbances, with clear avenues for scalability and extension to broader dynamics and horizons.

Abstract

We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of both the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the clairvoyant optimal policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To address this challenge, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.
Paper Structure (11 sections, 2 theorems, 23 equations, 3 figures)

This paper contains 11 sections, 2 theorems, 23 equations, 3 figures.

Key Result

Proposition 1

The scenario optimization problem eq:safe_robust_regret_minimization_epigraphic_scenario is equivalent to the following semidefinite program: where $\bm{H}_w^k = \bm{H}_w(\bm{\theta}^k)$, $S$ is the number of constraints in eq:safe_set_definition, and $\star$ denotes entries that can be inferred from symmetry.

Figures (3)

  • Figure 1: Comparison between empirical regret violation probability and theoretical upper bound as a function of the number of sampled scenarios.
  • Figure 2: Evolution of the probabilistic worst-case regret bounds (denoted by $\bar{\mathtt{R}}^\star_{N}$ and $\hat{\mathtt{R}}^\star_{N}$ on the left $y$-axis) and of the computation times (denoted by $\bar{\tau}_N$ and $\hat{\tau}_N$ on the right $y$-axis) for the exact and approximate solutions of \ref{['eq:safe_robust_regret_minimization_epigraphic_scenario_sdp']}, respectively, as a function of the number of considered scenarios.
  • Figure 3: Closed-loop comparison between $\bm{\pi}_{\mathtt{H}}$ and our $\bm{\pi}_{\mathtt{R}}$: a priori performance guarantees and realized control cost for different disturbance profiles and different realizations of the uncertain system dynamics. Points in the green shaded area denote instances where the proposed regret minimization approach yields an advantage in terms of lower upper bound (Figure \ref{['fig:closed_loop_comparison_upper_bounds']}) and realized performance (Figure \ref{['fig:closed_loop_comparison_realized_cost']}). We refer to our source code for a precise definition of the considered disturbance profiles.

Theorems & Definitions (7)

  • Remark 1
  • Remark 2
  • Remark 3
  • Proposition 1
  • Remark 4
  • Theorem 1
  • Remark 5