Table of Contents
Fetching ...

DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving

Xiaolu Liu, Yicong Li, Song Wang, Junbo Chen, Angela Yao, Jianke Zhu

Abstract

Recently, world models have been incorporated into the autonomous driving systems to improve the planning reliability. Existing approaches typically predict future states through appearance generation or deterministic regression, which limits their ability to capture trajectory-conditioned scene evolution and leads to unreliable action planning. To address this, we propose DynFlowDrive, a latent world model that leverages flow-based dynamics to model the transition of world states under different driving actions. By adopting the rectifiedflow formulation, the model learns a velocity field that describes how the scene state changes under different driving actions, enabling progressive prediction of future latent states. Building upon this, we further introduce a stability-aware multi-mode trajectory selection strategy that evaluates candidate trajectories according to the stability of the induced scene transitions. Extensive experiments on the nuScenes and NavSim benchmarks demonstrate consistent improvements across diverse driving frameworks without introducing additional inference overhead. Source code will be abaliable at https://github.com/xiaolul2/DynFlowDrive.

DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving

Abstract

Recently, world models have been incorporated into the autonomous driving systems to improve the planning reliability. Existing approaches typically predict future states through appearance generation or deterministic regression, which limits their ability to capture trajectory-conditioned scene evolution and leads to unreliable action planning. To address this, we propose DynFlowDrive, a latent world model that leverages flow-based dynamics to model the transition of world states under different driving actions. By adopting the rectifiedflow formulation, the model learns a velocity field that describes how the scene state changes under different driving actions, enabling progressive prediction of future latent states. Building upon this, we further introduce a stability-aware multi-mode trajectory selection strategy that evaluates candidate trajectories according to the stability of the induced scene transitions. Extensive experiments on the nuScenes and NavSim benchmarks demonstrate consistent improvements across diverse driving frameworks without introducing additional inference overhead. Source code will be abaliable at https://github.com/xiaolul2/DynFlowDrive.
Paper Structure (13 sections, 15 equations, 6 figures, 5 tables)

This paper contains 13 sections, 15 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: (a) Comparisons of perception-based and latent world model-based approaches on nuScenes and NavSim benchmarks. (b) Planning visualization on the front view and bird's-eye-view (BEV) space. Our DynFlowDrive achieves comparable performance.
  • Figure 2: Comparison between (a) the existing static world model and (b) the dynamic latent world model of our DynFlowDrive. Instead of the static regression of next-frame latents, we propose the dynamic modeling that learns a continuous velocity field $v_{\theta}$ to capture the evolution of world transitions.
  • Figure 3: Overview of DynFlowDrive. Given current observations, multi-mode trajectories are firstly generated by the standard planning module. A flow-based dynamic latent world model is incorporated to simulate the progressive future evolution in latent space. The resulting dynamics are used by a stability-aware multi-mode selection module, which assess the trajectory based on reconstruction quality and flow-based stability, enabling reliable supervision and improved planning robustness.
  • Figure 4: The architecture of our dynamic latent world model design, in which the velocity field $v_{\theta}$ is learnt to capture the trajectory-conditioned dynamics transitions in the latent space.
  • Figure 5: Stability-aware Multi-mode Selection. For training, the score head is supervised by the stable criterion. For Inference, the best mode trajectory is selected according to the highest score index.
  • ...and 1 more figures