Table of Contents
Fetching ...

Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

Lukas Theiner, Maik Pfefferkorn, Yongpeng Zhao, Sebastian Hirt, Rolf Findeisen

Abstract

Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.

Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

Abstract

Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.
Paper Structure (16 sections, 24 equations, 4 figures)

This paper contains 16 sections, 24 equations, 4 figures.

Figures (4)

  • Figure 1: Structure of the proposed multi-modal multi-fidelity BO framework. Due to the human decision maker, the high-fidelity experiment only returns preferential data. The additional information source provides low-fidelity numerical evaluations.
  • Figure 2: Left: Black and green areas show the $2\sigma$ credible intervals of the heteroscedastic Gaussian processes modeling observed human driver trajectories. The black model, trained on a range of drivers, serves as a low-fidelity model, while the green model, trained on one driver, is utilized to simulate high-fidelity preferences of a passenger for method evaluation. In each comparison, the simulated passenger prefers the trajectory that better aligns with the green model. Right: Overview of the track layout.
  • Figure 3: Regret of the recommended parameters $\xi^*_n$ --- obtained by maximizing the posterior mean of the surrogate model --- after each episode of the BO procedure. Shaded areas show the total range observed over five trials. Regret is computed using the true objective function $G(\xi)$ underlying the preferences of the simulated passenger.
  • Figure 4: Trajectories sampled during the high-fidelity phases of each method, aggregated across all five experimental trials.