Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving Styles
Lukas Theiner, Sebastian Hirt, Alexander Steinke, Rolf Findeisen
TL;DR
This work tackles learning the cost function of an MPC-based trajectory planner to match individualized driving styles by leveraging passenger preferences. It introduces prior-knowledge-informed preferential Bayesian optimization (PBO) that integrates a virtual decision maker, built from real driving data via a heteroscedastic Gaussian process, to guide parameter sampling and reduce sample complexity. The method extends PBO to handle multiple decision makers with a modified probit likelihood, data generation and selection, and a prior-driven initialization to accelerate convergence. In simulation, the approach achieves faster convergence and avoids sampling overly extreme driving styles, producing final trajectories that closely resemble a targeted driver model and improving passenger comfort during learning.
Abstract
Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control - where the cost function critically influences the resulting driving style. However, finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge. We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference. Due to increasing dimensionality of the parameter space, preference learning approaches might struggle to find a suitable optimum with a limited number of experiments and expose the passenger to discomfort when exploring the parameter space. We address these challenges by incorporating prior knowledge into the preferential Bayesian optimization framework. Our method constructs a virtual decision maker from real-world human driving data to guide parameter sampling. In a simulation experiment, we achieve faster convergence of the prior-knowledge-informed learning procedure compared to existing preferential Bayesian optimization approaches and reduce the number of inadequate driving styles sampled.
