Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement

Nikolaus Feith; Elmar Rueckert

Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement

Nikolaus Feith, Elmar Rueckert

TL;DR

The paper tackles the challenge of integrating human expertise into reinforcement learning in continuous spaces by proposing Interactive Bayesian Optimization (IBO) with a novel Preference Expected Improvement ($PEI$) acquisition function. IBO combines Gaussian Process-based BO with a user-informed search process and a GUI that allows policy shaping, enabling interaction at both parameter and preference levels. Through cartpole, reacher, and a Franka Panda robot task, the authors demonstrate that PEI-guided interaction can improve learning efficiency and robustness over standard BO baselines, with mixture interaction strategies offering stable performance across settings. The approach has practical impact for real-world robotics and other high-dimensional RL problems where human intuition and preferences can guide efficient exploration.

Abstract

Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes. However, most existing algorithms cannot be applied to Realworld Scenarios because their state spaces and/or action spaces are limited to discrete values. Furthermore, the interaction of all existing methods is restricted to deciding between multiple proposals. We therefore propose a novel framework based on Bayesian Optimization (BO). Interactive Bayesian Optimization (IBO) enables collaboration between machine learning algorithms and humans. This framework captures user preferences and provides an interface for users to shape the strategy by hand. Additionally, we've incorporated a new acquisition function, Preference Expected Improvement (PEI), to refine the system's efficiency using a probabilistic model of the user preferences. Our approach is geared towards ensuring that machines can benefit from human expertise, aiming for a more aligned and effective learning process. In the course of this work, we applied our method to simulations and in a real world task using a Franka Panda robot to show human-robot collaboration.

Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement

TL;DR

) acquisition function. IBO combines Gaussian Process-based BO with a user-informed search process and a GUI that allows policy shaping, enabling interaction at both parameter and preference levels. Through cartpole, reacher, and a Franka Panda robot task, the authors demonstrate that PEI-guided interaction can improve learning efficiency and robustness over standard BO baselines, with mixture interaction strategies offering stable performance across settings. The approach has practical impact for real-world robotics and other high-dimensional RL problems where human intuition and preferences can guide efficient exploration.

Abstract

Paper Structure (16 sections, 8 equations, 4 figures, 1 algorithm)

This paper contains 16 sections, 8 equations, 4 figures, 1 algorithm.

INTRODUCTION
RELATED WORK
Real time interaction
Episodic interactions
METHODS
Problem Statement
Bayesian Optimization
Gaussian Process
Acquisition Functions
Preference Expected Improvement (PEI)
Interactive Bayesian Optimization
RESULTS
Cartpole Balancing
Reacher
Robotic Task
...and 1 more sections

Figures (4)

Figure 1: Shows the architecture of Interactive Bayesian Optimization. The black arrows indicate the baseline data flow, the red arrows are additionally necessary for PEI. In blue BO is marked without interaction and representation model.
Figure 2: Shows Cartpole results for (a) preference, (b) shaping, (c) mixture experiments, and (d) final rewards. In blue is the baseline, in red experiments with Random IM, in orange with Regular IM and in green those with Improvement IM. All experiments were performed with 150 episodes and 25 runs, the plots show the current best rewards with 95% confidence interval.
Figure 3: Shows Reacher results for (a) preference, (b) shaping, (c) mixture experiments, and (d) final rewards. In blue is the baseline, in red experiments with Random IM, in orange with Regular IM and in green those with Improvement IM. All experiments were performed with 50 episodes and 25 runs, the plots show the current best rewards with 95% confidence interval.
Figure 4: Shows Robotic experiment results. In blue is the baseline, in orange an Preference experiment with the Improvement IM was performed. The green denotes the Random IM experiment with Shaping, and in red the Regular IM was used for a Mixture experiment. All experiments were performed with 50 episodes and 10 runs, the plots show the current best rewards with 95% confidence interval.

Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement

TL;DR

Abstract

Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement

Authors

TL;DR

Abstract

Table of Contents

Figures (4)