Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control
Sebastian Hirt, Lukas Theiner, Rolf Findeisen
TL;DR
This work tackles sample inefficiency in tuning MPC parameters for nonlinear closed-loop systems by exploiting the temporal structure of trajectories in Bayesian optimization. It introduces time-series-informed Bayesian optimization (TSI-BO) that aligns the BO fidelity with closed-loop time and incorporates intermediate partial-episode data as lower-fidelity observations, together with probabilistic early stopping and a convergence criterion. The approach yields a trace-aware surrogate and a taKG-based acquisition, enabling efficient termination of unpromising experiments while preserving eventual performance. In nonlinear cart-pole simulations, TSI-BO achieves comparable closed-loop performance with roughly half the resources and attains better final performance under the same budget, illustrating practical resource savings and improved convergence.
Abstract
Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the choice of controller parameters. Bayesian optimization allows learning of parameters from closed-loop experiments, but standard Bayesian optimization treats this as a black-box problem and ignores the temporal structure of closed-loop trajectories, leading to slow convergence and inefficient use of experimental resources. We propose a time-series-informed multi-fidelity Bayesian optimization framework that aligns the fidelity dimension with closed-loop time, enabling intermediate performance evaluations within a closed-loop experiment to be incorporated as lower-fidelity observations. Additionally, we derive probabilistic early stopping criteria to terminate unpromising closed-loop experiments based on the surrogate model's posterior belief, avoiding full episodes for poor parameterizations and thereby reducing resource usage. Simulation results on a nonlinear control benchmark demonstrate that, compared to standard black-box Bayesian optimization approaches, the proposed method achieves comparable closed-loop performance with roughly half the experimental resources, and yields better final performance when using the same resource budget, highlighting the value of exploiting temporal structure for sample-efficient closed-loop controller tuning.
