Table of Contents
Fetching ...

FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving

Mingzhe Guo, Yixiang Yang, Chuanrong Han, Rufeng Zhang, Shirui Li, Ji Wan, Zhipeng Zhang

Abstract

Effective environment modeling is the foundation for autonomous driving, underpinning tasks from perception to planning. However, current paradigms often inadequately consider the feedback of ego motion to the observation, which leads to an incomplete understanding of the driving process and consequently limits the planning capability. To address this issue, we introduce a novel ego-scene interactive modeling paradigm. Inspired by human recognition, the paradigm represents ego-scene interaction as the scene flow relative to the ego-vehicle. This conceptualization allows for modeling ego-motion feedback within a feature learning pattern, advantageously utilizing existing log-replay datasets rather than relying on scenario simulations. We specifically propose FlowAD, a general flow-based framework for autonomous driving. Within it, an ego-guided scene partition first constructs basic flow units to quantify scene flow. The ego-vehicle's forward direction and steering velocity directly shape the partition, which reflects ego motion. Then, based on flow units, spatial and temporal flow predictions are performed to model dynamics of scene flow, encompassing both spatial displacement and temporal variation. The final task-aware enhancement exploits learned spatio-temporal flow dynamics to benefit diverse tasks through object and region-level strategies. We also propose a novel Frames before Correct Planning (FCP) metric to assess the scene understanding capability. Experiments in both open and closed-loop evaluations demonstrate FlowAD's generality and effectiveness across perception, end-to-end planning, and VLM analysis. Notably, FlowAD reduces 19% collision rate over SparseDrive with FCP improvements of 1.39 frames (60%) on nuScenes, and achieves an impressive driving score of 51.77 on Bench2Drive, proving the superiority. Code, model, and configurations will be released here.

FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving

Abstract

Effective environment modeling is the foundation for autonomous driving, underpinning tasks from perception to planning. However, current paradigms often inadequately consider the feedback of ego motion to the observation, which leads to an incomplete understanding of the driving process and consequently limits the planning capability. To address this issue, we introduce a novel ego-scene interactive modeling paradigm. Inspired by human recognition, the paradigm represents ego-scene interaction as the scene flow relative to the ego-vehicle. This conceptualization allows for modeling ego-motion feedback within a feature learning pattern, advantageously utilizing existing log-replay datasets rather than relying on scenario simulations. We specifically propose FlowAD, a general flow-based framework for autonomous driving. Within it, an ego-guided scene partition first constructs basic flow units to quantify scene flow. The ego-vehicle's forward direction and steering velocity directly shape the partition, which reflects ego motion. Then, based on flow units, spatial and temporal flow predictions are performed to model dynamics of scene flow, encompassing both spatial displacement and temporal variation. The final task-aware enhancement exploits learned spatio-temporal flow dynamics to benefit diverse tasks through object and region-level strategies. We also propose a novel Frames before Correct Planning (FCP) metric to assess the scene understanding capability. Experiments in both open and closed-loop evaluations demonstrate FlowAD's generality and effectiveness across perception, end-to-end planning, and VLM analysis. Notably, FlowAD reduces 19% collision rate over SparseDrive with FCP improvements of 1.39 frames (60%) on nuScenes, and achieves an impressive driving score of 51.77 on Bench2Drive, proving the superiority. Code, model, and configurations will be released here.
Paper Structure (40 sections, 10 equations, 14 figures, 20 tables)

This paper contains 40 sections, 10 equations, 14 figures, 20 tables.

Figures (14)

  • Figure 1: (a) Vanilla auto-driving system that performs isolated inference for each timestamp. (b) Temporal auto-driving system that integrates historical observations, yet incompletely captures the feedback of previous ego planning. (c) Our ego-scene interactive system that leverages previous planning to inform future observations, which benefits comprehension of dynamic driving process.
  • Figure 2: Illustration of Ego-guided Scene Partition. The feedback of ego motion is reflected in Starting Point of Partition (forward direction) and Dynamic Adjustment of Partition Size (velocity). Then Multi-level Partition and Local Aggregation divides basic flow units and fuses local messages.
  • Figure 3: (a) The pipeline of Spatial Flow Prediction. (b) The pipeline of Temporal Flow Prediction.
  • Figure 4: The architecture of our FlowAD. For the input stage, the image features of multi-view videos are extracted with the backbone network. Then, the ego-scene interactive modeling introduces the feedback of the ego-motion and builds the spatio-temporal scene flow feature. Finally, the flow feature serves for the downstream tasks with the task-aware enhancement.
  • Figure 5: Qualitative results of multi-view object detection on nuScenes NuScenes. The baseline is SparseBEV SparseBEV.
  • ...and 9 more figures