Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement
Nikolaus Feith, Elmar Rueckert
TL;DR
The paper tackles the challenge of integrating human expertise into reinforcement learning in continuous spaces by proposing Interactive Bayesian Optimization (IBO) with a novel Preference Expected Improvement ($PEI$) acquisition function. IBO combines Gaussian Process-based BO with a user-informed search process and a GUI that allows policy shaping, enabling interaction at both parameter and preference levels. Through cartpole, reacher, and a Franka Panda robot task, the authors demonstrate that PEI-guided interaction can improve learning efficiency and robustness over standard BO baselines, with mixture interaction strategies offering stable performance across settings. The approach has practical impact for real-world robotics and other high-dimensional RL problems where human intuition and preferences can guide efficient exploration.
Abstract
Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes. However, most existing algorithms cannot be applied to Realworld Scenarios because their state spaces and/or action spaces are limited to discrete values. Furthermore, the interaction of all existing methods is restricted to deciding between multiple proposals. We therefore propose a novel framework based on Bayesian Optimization (BO). Interactive Bayesian Optimization (IBO) enables collaboration between machine learning algorithms and humans. This framework captures user preferences and provides an interface for users to shape the strategy by hand. Additionally, we've incorporated a new acquisition function, Preference Expected Improvement (PEI), to refine the system's efficiency using a probabilistic model of the user preferences. Our approach is geared towards ensuring that machines can benefit from human expertise, aiming for a more aligned and effective learning process. In the course of this work, we applied our method to simulations and in a real world task using a Franka Panda robot to show human-robot collaboration.
