Table of Contents
Fetching ...

ComDrive: Comfort-Oriented End-to-End Autonomous Driving

Junming Wang, Xingyu Zhang, Zebin Xing, Songen Gu, Xiaoyang Guo, Yang Hu, Ziying Song, Qian Zhang, Xiaoxiao Long, Wei Yin

TL;DR

ComDrive proposes a comfort-oriented end-to-end autonomous driving pipeline that uses sparse perception to build 3D scene representations, a diffusion-based motion planner to generate temporally consistent multi-modal trajectories, and a dual-stream adaptive trajectory scorer that blends rule-based safety with Vision-Language Model guidance to select the most comfortable trajectories. It addresses temporal inconsistency and passenger discomfort observed in prior systems and introduces a universal end-to-end driving comfort metric. Experimental results on nuScenes and a real-world dataset demonstrate state-of-the-art comfort and safety, with faster end-to-end planning and substantial collision reductions. The work also showcases adaptive driving styles via Llama 3.2V prompts, highlighting a practical path toward more comfortable autonomous driving without extensive fine-tuning.

Abstract

We propose ComDrive: the first comfort-oriented end-to-end autonomous driving system to generate temporally consistent and comfortable trajectories. Recent studies have demonstrated that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select safety trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the challenge of generating temporally inconsistent and uncomfortable trajectories. To address these issues, ComDrive first extracts 3D spatial representations through sparse perception, which then serves as conditional inputs. These inputs are used by a Conditional Denoising Diffusion Probabilistic Model (DDPM)-based motion planner to generate temporally consistent multi-modal trajectories. A dual-stream adaptive trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle. Experiments demonstrate that ComDrive achieves state-of-the-art performance in both comfort and safety, outperforming UniAD by 17% in driving comfort and reducing collision rates by 25% compared to SparseDrive. More results are available on our project page: https://jmwang0117.github.io/ComDrive/.

ComDrive: Comfort-Oriented End-to-End Autonomous Driving

TL;DR

ComDrive proposes a comfort-oriented end-to-end autonomous driving pipeline that uses sparse perception to build 3D scene representations, a diffusion-based motion planner to generate temporally consistent multi-modal trajectories, and a dual-stream adaptive trajectory scorer that blends rule-based safety with Vision-Language Model guidance to select the most comfortable trajectories. It addresses temporal inconsistency and passenger discomfort observed in prior systems and introduces a universal end-to-end driving comfort metric. Experimental results on nuScenes and a real-world dataset demonstrate state-of-the-art comfort and safety, with faster end-to-end planning and substantial collision reductions. The work also showcases adaptive driving styles via Llama 3.2V prompts, highlighting a practical path toward more comfortable autonomous driving without extensive fine-tuning.

Abstract

We propose ComDrive: the first comfort-oriented end-to-end autonomous driving system to generate temporally consistent and comfortable trajectories. Recent studies have demonstrated that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select safety trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the challenge of generating temporally inconsistent and uncomfortable trajectories. To address these issues, ComDrive first extracts 3D spatial representations through sparse perception, which then serves as conditional inputs. These inputs are used by a Conditional Denoising Diffusion Probabilistic Model (DDPM)-based motion planner to generate temporally consistent multi-modal trajectories. A dual-stream adaptive trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle. Experiments demonstrate that ComDrive achieves state-of-the-art performance in both comfort and safety, outperforming UniAD by 17% in driving comfort and reducing collision rates by 25% compared to SparseDrive. More results are available on our project page: https://jmwang0117.github.io/ComDrive/.
Paper Structure (18 sections, 12 equations, 7 figures, 4 tables)

This paper contains 18 sections, 12 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Architecture and Performance Evaluation of ComDrive.
  • Figure 2: Overview of our proposed framework. ComDrive first extracts features from multi-view images using an off-the-shelf visual encoder then perceives dynamic and static elements sparsely to generate 3D representation. The above representations and historical prediction trajectories are used as conditions of the diffusion model to generate temporal consistency multi-modal trajectories. The final trajectory scorer selects the most comfortable trajectory from these candidates to control the vehicle.
  • Figure 3: Overview of the Dual-Stream Adaptive Trajectory Scorer (DATS). The system integrates a Rule-Based Scorer with a VLM-Guided Dynamic Weight Adjuster for adaptive and interpretable trajectory scoring.
  • Figure 4: Qualitative results of Llama 3.2V on nuScenes. We show the questions (Q), context (C), and answers (A). Incorporating surround view images and textual data, the fine-tuning of driving styles via targeted weight modifications within the rule-based scorer.
  • Figure 5: Qualitative results on the nuScenes dataset. Our ComDrive exhibits strong temporal consistency.
  • ...and 2 more figures