Early Stopping Bayesian Optimization for Controller Tuning

David Stenger; Dominik Scheurenberg; Heike Vallery; Sebastian Trimpe

Early Stopping Bayesian Optimization for Controller Tuning

David Stenger, Dominik Scheurenberg, Heike Vallery, Sebastian Trimpe

TL;DR

The paper addresses the inefficiency of fixed-length episode evaluations in Bayesian Optimization for controller tuning by introducing Early Stopping BO (ESBO), which terminates episodes when suboptimality is detectable and supplements partial observations with three heuristics. ESBO-C, ESBO-TR, and ESBO-GP offer different strategies to incorporate partial information into the GP surrogate, enabling data-efficient optimization of time-integrated costs $J(\boldsymbol{\theta}) = \sum_{t=1}^{T_{\max}} j(u_t,y_t)$. Empirical results across five simulations and one hardware experiment show that ESBO variants can substantially reduce optimization time (up to ~48% in simulations and ~35% hardware) while maintaining comparable final performance to vanilla BO, with ESBO-GP generally delivering the best trade-off. The work provides a practical, dataset-efficient approach to controller tuning where evaluating full episodes is costly, and introduces a new problem class in BO for control involving early termination and partial data.

Abstract

Manual tuning of performance-critical controller parameters can be tedious and sub-optimal. Bayesian Optimization (BO) is an increasingly popular practical alternative to automatically optimize controller parameters from few experiments. Standard BO practice is to evaluate the closed-loop performance of parameters proposed during optimization on an episode with a fixed length. However, fixed-length episodes can be wasteful. For example, continuing an episode where already the start shows undesirable behavior such as strong oscillations seems pointless. Therefore, we propose a BO method that stops an episode early if suboptimality becomes apparent before an episode is completed. Such early stopping results in partial observations of the controller's performance, which cannot directly be included in standard BO. We propose three heuristics to facilitate partially observed episodes in BO. Through five numerical and one hardware experiment, we demonstrate that early stopping BO can substantially reduce the time needed for optimization.

Early Stopping Bayesian Optimization for Controller Tuning

TL;DR

. Empirical results across five simulations and one hardware experiment show that ESBO variants can substantially reduce optimization time (up to ~48% in simulations and ~35% hardware) while maintaining comparable final performance to vanilla BO, with ESBO-GP generally delivering the best trade-off. The work provides a practical, dataset-efficient approach to controller tuning where evaluating full episodes is costly, and introduces a new problem class in BO for control involving early termination and partial data.

Abstract

Paper Structure (20 sections, 17 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 17 equations, 4 figures, 1 table, 2 algorithms.

INTRODUCTION
Problem Statement: Early Stopping BO
Related Work
Background
Gaussian Process Regression
Bayesian Optimization
Early Stopping BO
When to stop an episode?
How to facilitate partial observations?
ESBO-C
ESBO-TR
ESBO-GP
Expected Strengths and Weaknesses
Empirical Evaluation
Implementation and Hyperparameters
...and 5 more sections

Figures (4)

Figure 1: Illustrative example of ESBO-GP: The first and the second samples $\theta_1$ and $\theta_2$ were evaluated for the whole episode. The second sample $\theta_2$ achieved an improvement. The evaluation of the third and fourth samples $\theta_3$ and $\theta_4$ was stopped early, because already after $T_3$ and $T_4$, an improvement was ruled out by the early stopping rule (see Sec. \ref{['sec:StoppingRule']}). The objective function values of $\theta_3$ and $\theta_4$ are sampled from a probabilistic estimate and included in the GP (see Sec. \ref{['sec:virtualDataSet']}). The GP model of the objective function uses the observed total cost for the complete evaluations. By using the partial information in the GP, the acquisition function is smaller in areas where partial observations have been made, and the parametrization for the next episode is found by maximizing the acquisition function.
Figure 2: Results of the simulation study. For an explanation of the metrics see Sec. \ref{['sec:Metrics']}.
Figure 3: Three tank hardware test bed at the model factory of Institute of Automatic Control, RWTH Aachen University.
Figure 4: The two plots represent the same hardware experiments, showing median scaled regret over experimentation time (top) and number of episodes (bottom). For an explanation of the metrics see Sec. \ref{['sec:Metrics']}.

Early Stopping Bayesian Optimization for Controller Tuning

TL;DR

Abstract

Early Stopping Bayesian Optimization for Controller Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)