Table of Contents
Fetching ...

Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling

Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin Zheng, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, Jingjing Liu

TL;DR

Flow Planner tackles interactive driving planning with three integrated advances: fine-grained trajectory tokenization to preserve local interactions, a spatiotemporal fusion backbone that unifies heterogeneous scene inputs, and flow matching with classifier-free guidance to capture multi-modal, condition-dependent behaviors. The method achieves state-of-the-art closed-loop performance among learning-based planners on nuPlan and interPlan benchmarks, particularly excelling in scenarios with dense interactions. The work emphasizes practical stability through consistency losses, ego-centric preprocessing, and adaptive attention, while acknowledging inference-speed limitations and proposing RL-assisted future work. Overall, Flow Planner advances interactive behavior understanding in autonomous driving by marrying expressive tokenization with principled, guided generation.

Abstract

Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.

Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling

TL;DR

Flow Planner tackles interactive driving planning with three integrated advances: fine-grained trajectory tokenization to preserve local interactions, a spatiotemporal fusion backbone that unifies heterogeneous scene inputs, and flow matching with classifier-free guidance to capture multi-modal, condition-dependent behaviors. The method achieves state-of-the-art closed-loop performance among learning-based planners on nuPlan and interPlan benchmarks, particularly excelling in scenarios with dense interactions. The work emphasizes practical stability through consistency losses, ego-centric preprocessing, and adaptive attention, while acknowledging inference-speed limitations and proposing RL-assisted future work. Overall, Flow Planner advances interactive behavior understanding in autonomous driving by marrying expressive tokenization with principled, guided generation.

Abstract

Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.

Paper Structure

This paper contains 17 sections, 10 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Overview of the Flow Planner framework.
  • Figure 2: A typical out-of-distribution scenario in interPlan benchmark: nudging around crashed vehicles. Flow Planner demonstrate strong scene understanding ability and generation adaptability in the situation that is totaly unseen in the training data.
  • Figure 3: Visualization of interaction behaviors. Two challenging scenarios with distinctive interactions in closed-loop testing, including: (a) changing lane and (b) unprotected left turn in the closed-loop test. The trajectories illustrated here include: the future planning of ego vehicle, the ego history, and the neighbor history.
  • Figure 4: Illustration of the influence of token number on trajectory quality.
  • Figure 5: Ablation on the number of trajectory segments on nuPlan Val14 Benchmark.
  • ...and 3 more figures