Table of Contents
Fetching ...

Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving Styles

Lukas Theiner, Sebastian Hirt, Alexander Steinke, Rolf Findeisen

TL;DR

This work tackles learning the cost function of an MPC-based trajectory planner to match individualized driving styles by leveraging passenger preferences. It introduces prior-knowledge-informed preferential Bayesian optimization (PBO) that integrates a virtual decision maker, built from real driving data via a heteroscedastic Gaussian process, to guide parameter sampling and reduce sample complexity. The method extends PBO to handle multiple decision makers with a modified probit likelihood, data generation and selection, and a prior-driven initialization to accelerate convergence. In simulation, the approach achieves faster convergence and avoids sampling overly extreme driving styles, producing final trajectories that closely resemble a targeted driver model and improving passenger comfort during learning.

Abstract

Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control - where the cost function critically influences the resulting driving style. However, finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge. We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference. Due to increasing dimensionality of the parameter space, preference learning approaches might struggle to find a suitable optimum with a limited number of experiments and expose the passenger to discomfort when exploring the parameter space. We address these challenges by incorporating prior knowledge into the preferential Bayesian optimization framework. Our method constructs a virtual decision maker from real-world human driving data to guide parameter sampling. In a simulation experiment, we achieve faster convergence of the prior-knowledge-informed learning procedure compared to existing preferential Bayesian optimization approaches and reduce the number of inadequate driving styles sampled.

Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving Styles

TL;DR

This work tackles learning the cost function of an MPC-based trajectory planner to match individualized driving styles by leveraging passenger preferences. It introduces prior-knowledge-informed preferential Bayesian optimization (PBO) that integrates a virtual decision maker, built from real driving data via a heteroscedastic Gaussian process, to guide parameter sampling and reduce sample complexity. The method extends PBO to handle multiple decision makers with a modified probit likelihood, data generation and selection, and a prior-driven initialization to accelerate convergence. In simulation, the approach achieves faster convergence and avoids sampling overly extreme driving styles, producing final trajectories that closely resemble a targeted driver model and improving passenger comfort during learning.

Abstract

Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control - where the cost function critically influences the resulting driving style. However, finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge. We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference. Due to increasing dimensionality of the parameter space, preference learning approaches might struggle to find a suitable optimum with a limited number of experiments and expose the passenger to discomfort when exploring the parameter space. We address these challenges by incorporating prior knowledge into the preferential Bayesian optimization framework. Our method constructs a virtual decision maker from real-world human driving data to guide parameter sampling. In a simulation experiment, we achieve faster convergence of the prior-knowledge-informed learning procedure compared to existing preferential Bayesian optimization approaches and reduce the number of inadequate driving styles sampled.

Paper Structure

This paper contains 17 sections, 14 equations, 6 figures.

Figures (6)

  • Figure 1: Preferential Bayesian optimization (PBO) provides parameters $\theta$ to the trajectory planner based on feedback by the primary decision-maker. The proposed prior-knowledge-informed PBO exploits data obtained from a virtual decision-maker, utilizing human driving data and simulations.
  • Figure 2: Left: Heteroscedastic Gaussian process models of five different observed human driving styles. The colored areas show the $2\sigma$ confidence regions. Right: Overview of the track layout.
  • Figure 3: Regret of each queried sample over the course of the PBO iterations when using prior-knowledge-informed PBO (red) and a standard PBO approach (black). Preferred samples are indicated by a $+$, less preferred samples are indicated by $\circ$. The continuous lines show the simple regret.
  • Figure 4: Comparison of simple regret over the course of the learning process for prior-knowledge-informed PBO (red), standard PBO (black), and GLISp bemporad2021_glispbemporad2021_preferencebasedMPCcalibration (cyan), averaged over 5 trials each. The error bars show the full observed range. Note that GLISp only queries a single new parameter per iteration and compares with the best-so-far, while PBO queries a pair of parameter sets. Hence, with the same experimental budget we grant GLISp 100 iterations, while PBO only runs for 50 iterations.
  • Figure 5: Comparison of queried driving styles over all PBO iterations when using the data-based human driver model as prior knowledge (red) and without prior knowledge (black), all 10 experiment trials aggregated. Left: velocity profile. Right: normalized lateral and longitudinal accelerations.
  • ...and 1 more figures