Sequential learning and control: Targeted exploration for robust performance

Janani Venkatasubramanian; Johannes Köhler; Julian Berberich; Frank Allgöwer

Sequential learning and control: Targeted exploration for robust performance

Janani Venkatasubramanian, Johannes Köhler, Julian Berberich, Frank Allgöwer

TL;DR

This work tackles robust dual control for uncertain discrete‑time linear systems by separating learning and control into a targeted exploration phase and a subsequent gain‑scheduled exploitation phase. It leverages spectral‑line theory to establish a priori finite excitation bounds and formulates an SDP to compute harmonic exploration with minimal energy that guarantees informative data for robust controller design. After exploration, a robust gain‑scheduling controller based on LPV system theory enforces an $H_2$ performance bound with probabilistic guarantees, using updated parameter estimates as scheduling variables. Numerical results on a hard‑to‑learn system demonstrate that targeted harmonic exploration yields stronger excitation than random probing and enables guaranteed performance after exploration, highlighting a practical, tractable approach to robust dual control.

Abstract

We present a novel dual control strategy for uncertain linear systems based on targeted harmonic exploration and gain-scheduling with performance and excitation guarantees. In the proposed sequential approach, robust control is implemented after exploration with the main feature that the exploration is optimized with respect to the robust control performance. Specifically, we leverage recent results on finite excitation using spectral lines to determine a high probability lower bound on the resultant finite excitation of the exploration data. This provides an a priori upper bound on the remaining model uncertainty after exploration, which can further be leveraged in a gain-scheduling controller design that guarantees robust performance. This leads to a semidefinite program-based design which computes an exploration strategy with finite excitation bounds and minimal energy, and a gain-scheduled controller with probabilistic performance bounds that can be implemented after exploration. The effectiveness of our approach and its benefits over common random exploration strategies are demonstrated with an example of a system which is 'hard to learn'.

Sequential learning and control: Targeted exploration for robust performance

TL;DR

performance bound with probabilistic guarantees, using updated parameter estimates as scheduling variables. Numerical results on a hard‑to‑learn system demonstrate that targeted harmonic exploration yields stronger excitation than random probing and enables guaranteed performance after exploration, highlighting a practical, tractable approach to robust dual control.

Abstract

Paper Structure (28 sections, 95 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 95 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Problem Statement
Setting
Preliminaries
Uncertainty bound
Frequency domain information using spectral lines
Targeted Exploration
Exploration strategy
Bound on finite excitation
Convex relaxation
Bounds on transfer matrices
Final bound on the informativity of exploration
Robust Gain-scheduling Design
Dual Control
Relationship between uncertain parameters
...and 13 more sections

Figures (5)

Figure 1: Sketch of the sequential robust dual control strategy.
Figure 2: Generalized plant view of the robust gain-scheduling problem venkatasubramanian2020robust.
Figure 3: Illustration of the sets from the proof of Lemma \ref{['lem:projection-bounds']}: $\mathbf{\Theta}_0=\mathbf{\Theta}_s,\text{ and }\mathbf{\Theta}_u$, the true parameters $\theta_\mathrm{tr}$, the initial parameter estimate $\hat{\theta}_\mathrm{prior}$, the estimate resulting from exploration $\hat{\theta}_T$, and the projected estimate $\tilde{\theta}_T$.
Figure 4: Illustration of $\mathrm{(i)}$ the mean and standard deviation of the excitation achieved by targeted exploration $D_{T_{T,11}}$, and by random exploration $D_{T_{R,11}}$, for different values of $\alpha_i$ that scales the initial uncertainty bound $D_0$, and $\mathrm{(ii)}$ the desired excitation $\overline{D}_{T_{11}}=10^6$.
Figure 5: Illustration of the exploration energy $\gamma_\mathrm{e}$ required to achieve a certain $H_2$ performance $\gamma_\mathrm{p}$. For comparison, the $H_2$ performance of a robust controller based on the initial uncertainty bound $D_0^{-1}$, and the optimal $H_2$ performance based on true system knowledge are provided.

Theorems & Definitions (8)

proof
proof
proof
proof
proof
proof
proof
proof

Sequential learning and control: Targeted exploration for robust performance

TL;DR

Abstract

Sequential learning and control: Targeted exploration for robust performance

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (8)