Table of Contents
Fetching ...

Preferential Bayesian Optimization with Crash Feedback

Johanna Menn, David Stenger, Sebastian Trimpe

Abstract

Bayesian optimization is a popular black-box optimization method for parameter learning in control and robotics. It typically requires an objective function that reflects the user's optimization goal. However, in practical applications, this objective function is often inaccessible due to complex or unmeasurable performance metrics. Preferential Bayesian optimization (PBO) overcomes this limitation by leveraging human feedback through pairwise comparisons, eliminating the need for explicit performance quantification. When applying PBO to hardware systems, such as in quadcopter control, crashes can cause time-consuming experimental resets, wear and tear, or otherwise undesired outcomes. Standard PBO methods cannot incorporate feedback from such crashed experiments, resulting in the exploration of parameters that frequently lead to experimental crashes. We thus introduce CrashPBO, a user-friendly mechanism that enables users to both express preferences and report crashes during the optimization process. Benchmarking on synthetic functions shows that this mechanism reduces crashes by 63% and increases data efficiency. Through experiments on three robotics platforms, we demonstrate the wide applicability and transferability of CrashPBO, highlighting that it provides a flexible, user-friendly framework for parameter learning with human feedback on preferences and crashes.

Preferential Bayesian Optimization with Crash Feedback

Abstract

Bayesian optimization is a popular black-box optimization method for parameter learning in control and robotics. It typically requires an objective function that reflects the user's optimization goal. However, in practical applications, this objective function is often inaccessible due to complex or unmeasurable performance metrics. Preferential Bayesian optimization (PBO) overcomes this limitation by leveraging human feedback through pairwise comparisons, eliminating the need for explicit performance quantification. When applying PBO to hardware systems, such as in quadcopter control, crashes can cause time-consuming experimental resets, wear and tear, or otherwise undesired outcomes. Standard PBO methods cannot incorporate feedback from such crashed experiments, resulting in the exploration of parameters that frequently lead to experimental crashes. We thus introduce CrashPBO, a user-friendly mechanism that enables users to both express preferences and report crashes during the optimization process. Benchmarking on synthetic functions shows that this mechanism reduces crashes by 63% and increases data efficiency. Through experiments on three robotics platforms, we demonstrate the wide applicability and transferability of CrashPBO, highlighting that it provides a flexible, user-friendly framework for parameter learning with human feedback on preferences and crashes.

Paper Structure

This paper contains 16 sections, 9 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: CrashPBO enables optimization of robotic tasks directly from human preferences. The human provides feedback on preferences ("Experiment A was better than B") and crashes (i.e., totally undesirable experiments). A crashed experiment is ranked worse than all successful ones, preventing the exploration of unsafe or undesirable regions.
  • Figure 2: Applications of CrashPBO
  • Figure 3: Crash mechanism in PBO: By adding virtual comparisons of the crashed experiment (red) with all successful experiments (black), we ensure that the posterior is worse in the crashed regions. This reduces the likelihood of exploration in the crashed region. Comparisons between data points are indicated by $\mathbf{x}_\mathrm{A} \succ \mathbf{x}_\mathrm{B}$, meaning that $\mathbf{x}_\mathrm{A}$ is preferred over $\mathbf{x}_\mathrm{B}$, a comparison is indicated with a line in the plot.
  • Figure 4: Synthetic results showing average normalized performance and average crashes for within-model comparison and synthetic functions. We compare CrashPBO to the PBO method EUBO, the standard BO method MES, the safe BO method SafeOpt, and random search. CrashPBO performs similarly to MES and EUBO and outperforms safe BO and random search while reducing crashes compared to EUBO in both settings.
  • Figure 5: Experimental outcomes of three DMs tuning the backflip maneuver with subjective preferences. The top row shows the mean of the learned pairwise GP after the optimization, and the bottom row shows the trajectories of all evaluated flips. The black points indicate the shared initial parameters, which correspond to the black trajectories below. The final preferred parameters are marked with a star and appear as the bold trajectory. All other evaluated parameters are shown as points when the flip succeeded and as crosses when the experiment crashed.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1