Table of Contents
Fetching ...

Adaptive Time Step Flow Matching for Autonomous Driving Motion Planning

Ananya Trivedi, Anjian Li, Mohamed Elnoor, Yusuf Umut Ciftci, Avinash Singh, Jovin D'sa, Sangjae Bae, David Isele, Taskin Padir, Faizan M. Tariq

TL;DR

This work tackles real-time motion planning for autonomous driving in urban settings by introducing a variance-adaptive flow-matching framework conditioned on a rich scene context. A Motion Transformer Encoder provides context to a velocity-field predictor, while a lightweight variance head adaptively selects integration steps, enabling real-time inference without retraining. A convex quadratic program refines the ego trajectory to satisfy dynamic constraints with minimal overhead, improving ride comfort and feasibility. Trained on the Waymo Open Motion Dataset, the approach achieves superior trajectory smoothness and constraint adherence compared with diffusion, consistency, and transformer baselines, running at about 20 Hz on an RTX 3070 and demonstrating effective handling of diverse maneuvers without scenario-specific tuning.

Abstract

Autonomous driving requires reasoning about interactions with surrounding traffic. A prevailing approach is large-scale imitation learning on expert driving datasets, aimed at generalizing across diverse real-world scenarios. For online trajectory generation, such methods must operate at real-time rates. Diffusion models require hundreds of denoising steps at inference, resulting in high latency. Consistency models mitigate this issue but rely on carefully tuned noise schedules to capture the multimodal action distributions common in autonomous driving. Adapting the schedule, typically requires expensive retraining. To address these limitations, we propose a framework based on conditional flow matching that jointly predicts future motions of surrounding agents and plans the ego trajectory in real time. We train a lightweight variance estimator that selects the number of inference steps online, removing the need for retraining to balance runtime and imitation learning performance. To further enhance ride quality, we introduce a trajectory post-processing step cast as a convex quadratic program, with negligible computational overhead. Trained on the Waymo Open Motion Dataset, the framework performs maneuvers such as lane changes, cruise control, and navigating unprotected left turns without requiring scenario-specific tuning. Our method maintains a 20 Hz update rate on an NVIDIA RTX 3070 GPU, making it suitable for online deployment. Compared to transformer, diffusion, and consistency model baselines, we achieve improved trajectory smoothness and better adherence to dynamic constraints. Experiment videos and code implementations can be found at https://flow-matching-self-driving.github.io/.

Adaptive Time Step Flow Matching for Autonomous Driving Motion Planning

TL;DR

This work tackles real-time motion planning for autonomous driving in urban settings by introducing a variance-adaptive flow-matching framework conditioned on a rich scene context. A Motion Transformer Encoder provides context to a velocity-field predictor, while a lightweight variance head adaptively selects integration steps, enabling real-time inference without retraining. A convex quadratic program refines the ego trajectory to satisfy dynamic constraints with minimal overhead, improving ride comfort and feasibility. Trained on the Waymo Open Motion Dataset, the approach achieves superior trajectory smoothness and constraint adherence compared with diffusion, consistency, and transformer baselines, running at about 20 Hz on an RTX 3070 and demonstrating effective handling of diverse maneuvers without scenario-specific tuning.

Abstract

Autonomous driving requires reasoning about interactions with surrounding traffic. A prevailing approach is large-scale imitation learning on expert driving datasets, aimed at generalizing across diverse real-world scenarios. For online trajectory generation, such methods must operate at real-time rates. Diffusion models require hundreds of denoising steps at inference, resulting in high latency. Consistency models mitigate this issue but rely on carefully tuned noise schedules to capture the multimodal action distributions common in autonomous driving. Adapting the schedule, typically requires expensive retraining. To address these limitations, we propose a framework based on conditional flow matching that jointly predicts future motions of surrounding agents and plans the ego trajectory in real time. We train a lightweight variance estimator that selects the number of inference steps online, removing the need for retraining to balance runtime and imitation learning performance. To further enhance ride quality, we introduce a trajectory post-processing step cast as a convex quadratic program, with negligible computational overhead. Trained on the Waymo Open Motion Dataset, the framework performs maneuvers such as lane changes, cruise control, and navigating unprotected left turns without requiring scenario-specific tuning. Our method maintains a 20 Hz update rate on an NVIDIA RTX 3070 GPU, making it suitable for online deployment. Compared to transformer, diffusion, and consistency model baselines, we achieve improved trajectory smoothness and better adherence to dynamic constraints. Experiment videos and code implementations can be found at https://flow-matching-self-driving.github.io/.
Paper Structure (15 sections, 9 equations, 5 figures, 5 tables)

This paper contains 15 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Our approach encodes past motion, map layout, and the desired pose using a transformer-based encoder. The resulting representation is passed to a flow matching network with an adaptive number of integration steps, followed by a lightweight post-processing step to generate a motion plan for the ego vehicle and future behavior predictions for surrounding agents.
  • Figure 2: (a) The ego takes a sharp right exit. (b) From the same initial pose, the goal is changed to a left lane change. The policy adapts and produces smooth, lane-aligned trajectories.
  • Figure 3: Example maneuvers handled by the proposed method: (a) approaching stop-and-go traffic, (b) changing multiple lanes in front of traffic to reach the goal, and (c) yielding to oncoming vehicles before executing an unprotected left turn. The bottom row shows the corresponding ego speed profiles. Videos of these and additional maneuvers can be found at https://flow-matching-self-driving.github.io/
  • Figure 4: Collision rate plot: Collisions are rare initially and higher toward the end of the horizon. With continuous replanning, the early time steps are most critical for safety.
  • Figure 5: Normalized integration time for different NFE. The non-uniform progression reflects variance-based step size adjustment, unlike the uniform fixed-step Euler integration.