Sequential learning and control: Targeted exploration for robust performance
Janani Venkatasubramanian, Johannes Köhler, Julian Berberich, Frank Allgöwer
TL;DR
This work tackles robust dual control for uncertain discrete‑time linear systems by separating learning and control into a targeted exploration phase and a subsequent gain‑scheduled exploitation phase. It leverages spectral‑line theory to establish a priori finite excitation bounds and formulates an SDP to compute harmonic exploration with minimal energy that guarantees informative data for robust controller design. After exploration, a robust gain‑scheduling controller based on LPV system theory enforces an $H_2$ performance bound with probabilistic guarantees, using updated parameter estimates as scheduling variables. Numerical results on a hard‑to‑learn system demonstrate that targeted harmonic exploration yields stronger excitation than random probing and enables guaranteed performance after exploration, highlighting a practical, tractable approach to robust dual control.
Abstract
We present a novel dual control strategy for uncertain linear systems based on targeted harmonic exploration and gain-scheduling with performance and excitation guarantees. In the proposed sequential approach, robust control is implemented after exploration with the main feature that the exploration is optimized with respect to the robust control performance. Specifically, we leverage recent results on finite excitation using spectral lines to determine a high probability lower bound on the resultant finite excitation of the exploration data. This provides an a priori upper bound on the remaining model uncertainty after exploration, which can further be leveraged in a gain-scheduling controller design that guarantees robust performance. This leads to a semidefinite program-based design which computes an exploration strategy with finite excitation bounds and minimal energy, and a gain-scheduled controller with probabilistic performance bounds that can be implemented after exploration. The effectiveness of our approach and its benefits over common random exploration strategies are demonstrated with an example of a system which is 'hard to learn'.
