Table of Contents
Fetching ...

Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control

Sebastian Hirt, Lukas Theiner, Rolf Findeisen

TL;DR

This work tackles sample inefficiency in tuning MPC parameters for nonlinear closed-loop systems by exploiting the temporal structure of trajectories in Bayesian optimization. It introduces time-series-informed Bayesian optimization (TSI-BO) that aligns the BO fidelity with closed-loop time and incorporates intermediate partial-episode data as lower-fidelity observations, together with probabilistic early stopping and a convergence criterion. The approach yields a trace-aware surrogate and a taKG-based acquisition, enabling efficient termination of unpromising experiments while preserving eventual performance. In nonlinear cart-pole simulations, TSI-BO achieves comparable closed-loop performance with roughly half the resources and attains better final performance under the same budget, illustrating practical resource savings and improved convergence.

Abstract

Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the choice of controller parameters. Bayesian optimization allows learning of parameters from closed-loop experiments, but standard Bayesian optimization treats this as a black-box problem and ignores the temporal structure of closed-loop trajectories, leading to slow convergence and inefficient use of experimental resources. We propose a time-series-informed multi-fidelity Bayesian optimization framework that aligns the fidelity dimension with closed-loop time, enabling intermediate performance evaluations within a closed-loop experiment to be incorporated as lower-fidelity observations. Additionally, we derive probabilistic early stopping criteria to terminate unpromising closed-loop experiments based on the surrogate model's posterior belief, avoiding full episodes for poor parameterizations and thereby reducing resource usage. Simulation results on a nonlinear control benchmark demonstrate that, compared to standard black-box Bayesian optimization approaches, the proposed method achieves comparable closed-loop performance with roughly half the experimental resources, and yields better final performance when using the same resource budget, highlighting the value of exploiting temporal structure for sample-efficient closed-loop controller tuning.

Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control

TL;DR

This work tackles sample inefficiency in tuning MPC parameters for nonlinear closed-loop systems by exploiting the temporal structure of trajectories in Bayesian optimization. It introduces time-series-informed Bayesian optimization (TSI-BO) that aligns the BO fidelity with closed-loop time and incorporates intermediate partial-episode data as lower-fidelity observations, together with probabilistic early stopping and a convergence criterion. The approach yields a trace-aware surrogate and a taKG-based acquisition, enabling efficient termination of unpromising experiments while preserving eventual performance. In nonlinear cart-pole simulations, TSI-BO achieves comparable closed-loop performance with roughly half the resources and attains better final performance under the same budget, illustrating practical resource savings and improved convergence.

Abstract

Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the choice of controller parameters. Bayesian optimization allows learning of parameters from closed-loop experiments, but standard Bayesian optimization treats this as a black-box problem and ignores the temporal structure of closed-loop trajectories, leading to slow convergence and inefficient use of experimental resources. We propose a time-series-informed multi-fidelity Bayesian optimization framework that aligns the fidelity dimension with closed-loop time, enabling intermediate performance evaluations within a closed-loop experiment to be incorporated as lower-fidelity observations. Additionally, we derive probabilistic early stopping criteria to terminate unpromising closed-loop experiments based on the surrogate model's posterior belief, avoiding full episodes for poor parameterizations and thereby reducing resource usage. Simulation results on a nonlinear control benchmark demonstrate that, compared to standard black-box Bayesian optimization approaches, the proposed method achieves comparable closed-loop performance with roughly half the experimental resources, and yields better final performance when using the same resource budget, highlighting the value of exploiting temporal structure for sample-efficient closed-loop controller tuning.

Paper Structure

This paper contains 13 sections, 16 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Proposed approach: The fidelity dimension $s$ of the Bayesian optimization surrogate $\bar{G}$ is aligned with closed-loop time, enabling intermediate closed-loop evaluations $\bar{G}(\theta_n,s)$ to be incorporated into the optimization process. Early termination is decided at the current iteration by predicting the upper confidence bound at the target fidelity ($\mathrm{UCB}(\theta_n,1)$), which corresponds to the total cost of the full closed-loop experiment, and comparing it to the best observed cost $G_n^*$. This allows efficient use of experimental resources.
  • Figure 2: All sampled closed-loop trajectories for a single run of the proposed TSI-BO procedure using EI-based and convergence-based early stopping. Early termination of unpromising or already-converged episodes illustrates the resource savings achieved by the proposed approach.
  • Figure 3: Comparison of best-so-far cost across 10 independent runs. The top panel shows ablations of the proposed TSI-BO method, including the BO baseline, TSI-BO with $\mathcal{E}_\mathrm{EI}$ and $\mathcal{E}_\mathrm{C}$, TSI-BO with $\mathcal{E}_\mathrm{EI}$ only, and TSI-BO without early stopping. The bottom panel compares the EI- and UCB-based early stopping criteria against the same BO baseline. Shaded areas indicate the minimum--maximum range across runs, and the horizontal axis denotes the number of closed-loop iterations (i.e., experimental resources).