Table of Contents
Fetching ...

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, Wei Yin

TL;DR

GoalFlow tackles multimodal trajectory generation for end-to-end autonomous driving by constraining diffusion-based generation with goal points and employing Flow Matching for efficient, high-quality trajectories. It introduces a goal-point vocabulary with a dual-score mechanism (distance to ground truth and drivable-area validity) and uses Rectified Flow conditioned on BEV scene representations to produce multiple candidate trajectories, which are then scored to select the best. The approach achieves state-of-the-art results on Navsim/OpenScene (PDMS 90.3) and remains robust with a single denoising step, highlighting practical deployment potential. The work advances end-to-end driving by integrating goal-guided generation, efficient flow-based modeling, and rigorous trajectory scoring to ensure safety and performance.

Abstract

We propose GoalFlow, an end-to-end autonomous driving method for generating high-quality multimodal trajectories. In autonomous driving scenarios, there is rarely a single suitable trajectory. Recent methods have increasingly focused on modeling multimodal trajectory distributions. However, they suffer from trajectory selection complexity and reduced trajectory quality due to high trajectory divergence and inconsistencies between guidance and scene information. To address these issues, we introduce GoalFlow, a novel method that effectively constrains the generative process to produce high-quality, multimodal trajectories. To resolve the trajectory divergence problem inherent in diffusion-based methods, GoalFlow constrains the generated trajectories by introducing a goal point. GoalFlow establishes a novel scoring mechanism that selects the most appropriate goal point from the candidate points based on scene information. Furthermore, GoalFlow employs an efficient generative method, Flow Matching, to generate multimodal trajectories, and incorporates a refined scoring mechanism to select the optimal trajectory from the candidates. Our experimental results, validated on the Navsim\cite{Dauner2024_navsim}, demonstrate that GoalFlow achieves state-of-the-art performance, delivering robust multimodal trajectories for autonomous driving. GoalFlow achieved PDMS of 90.3, significantly surpassing other methods. Compared with other diffusion-policy-based methods, our approach requires only a single denoising step to obtain excellent performance. The code is available at https://github.com/YvanYin/GoalFlow.

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

TL;DR

GoalFlow tackles multimodal trajectory generation for end-to-end autonomous driving by constraining diffusion-based generation with goal points and employing Flow Matching for efficient, high-quality trajectories. It introduces a goal-point vocabulary with a dual-score mechanism (distance to ground truth and drivable-area validity) and uses Rectified Flow conditioned on BEV scene representations to produce multiple candidate trajectories, which are then scored to select the best. The approach achieves state-of-the-art results on Navsim/OpenScene (PDMS 90.3) and remains robust with a single denoising step, highlighting practical deployment potential. The work advances end-to-end driving by integrating goal-guided generation, efficient flow-based modeling, and rigorous trajectory scoring to ensure safety and performance.

Abstract

We propose GoalFlow, an end-to-end autonomous driving method for generating high-quality multimodal trajectories. In autonomous driving scenarios, there is rarely a single suitable trajectory. Recent methods have increasingly focused on modeling multimodal trajectory distributions. However, they suffer from trajectory selection complexity and reduced trajectory quality due to high trajectory divergence and inconsistencies between guidance and scene information. To address these issues, we introduce GoalFlow, a novel method that effectively constrains the generative process to produce high-quality, multimodal trajectories. To resolve the trajectory divergence problem inherent in diffusion-based methods, GoalFlow constrains the generated trajectories by introducing a goal point. GoalFlow establishes a novel scoring mechanism that selects the most appropriate goal point from the candidate points based on scene information. Furthermore, GoalFlow employs an efficient generative method, Flow Matching, to generate multimodal trajectories, and incorporates a refined scoring mechanism to select the optimal trajectory from the candidates. Our experimental results, validated on the Navsim\cite{Dauner2024_navsim}, demonstrate that GoalFlow achieves state-of-the-art performance, delivering robust multimodal trajectories for autonomous driving. GoalFlow achieved PDMS of 90.3, significantly surpassing other methods. Compared with other diffusion-policy-based methods, our approach requires only a single denoising step to obtain excellent performance. The code is available at https://github.com/YvanYin/GoalFlow.

Paper Structure

This paper contains 20 sections, 20 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The comparison of different multimodal trajectory generation paradigms recently. A standalone generative model often produces highly diverse trajectories with no clear boundaries between different modalities. In contrast, the Goal-Driven Generation Model leverages the strong guidance of goal points, effectively distinguishing multiple modalities by utilizing different goal points.
  • Figure 2: Overview of the GoalFlow architecture. GoalFlow consists of three modules. The Perception Module is responsible for integrating scene information into a BEV feature $F_{bev}$, the Goal Point Construction Module selects the optimal goal point from Goal Point Vocabulary $\mathbb{V}$ as guidance information, and the Trajectory Planning Module generates the trajectories by denoising from the Gaussian distribution to the target distribution. Finally, the Trajectory Scorer selects the optimal trajectory from the candidates.
  • Figure 4: The network architecture used in Rectified Flow.
  • Figure 5: Visualization of Trajectories.$\times$ indicates that the trajectory results in a collision or goes beyond the drivable area, while ✓ represents a safe trajectory. The orange points are generated by the Goal Constructor, while the blue and yellow points correspond to samples from the vocabulary. The results highlight that GoalFlow generates higher-quality trajectories compared to the other two methods.
  • Figure 7: Visualization of trajectories. We visualize four scenarios: going straight, turning left, turning right, and yielding. For each scenario, 128 trajectories were generated using GoalFlow.
  • ...and 3 more figures