StyleDrive: Towards Driving-Style Aware Benchmarking of End-To-End Autonomous Driving
Ruiyang Hao, Bowen Jing, Haibao Yu, Zaiqing Nie
TL;DR
StyleDrive addresses the lack of personalization in end-to-end autonomous driving by introducing a large real-world dataset and a standardized benchmark tailored for driving-style conditioning. The authors combine map topology, dynamic semantics inferred by a fine-tuned vision-language model, rule-and-distribution-based heuristics, and human-in-the-loop verification to annotate both objective driving behaviors and subjective driving style preferences. They then establish the StyleDrive Benchmark with a Style-Modulated PDMS metric (SM-PDMS) to evaluate how closely a policy aligns with target driving styles while maintaining safety, tested across multiple state-of-the-art models. Results show that incorporating driving preferences substantially improves behavioral alignment with human demonstrations, highlighting the value of style-conditioned E2EAD for trust, safety, and real-world adoption.
Abstract
Personalization, while extensively studied in conventional autonomous driving pipelines, has been largely overlooked in the context of end-to-end autonomous driving (E2EAD), despite its critical role in fostering user trust, safety perception, and real-world adoption. A primary bottleneck is the absence of large-scale real-world datasets that systematically capture driving preferences, severely limiting the development and evaluation of personalized E2EAD models. In this work, we introduce the first large-scale real-world dataset explicitly curated for personalized E2EAD, integrating comprehensive scene topology with rich dynamic context derived from agent dynamics and semantics inferred via a fine-tuned vision-language model (VLM). We propose a hybrid annotation pipeline that combines behavioral analysis, rule-and-distribution-based heuristics, and subjective semantic modeling guided by VLM reasoning, with final refinement through human-in-the-loop verification. Building upon this dataset, we introduce the first standardized benchmark for systematically evaluating personalized E2EAD models. Empirical evaluations on state-of-the-art architectures demonstrate that incorporating personalized driving preferences significantly improves behavioral alignment with human demonstrations.
