Table of Contents
Fetching ...

End-to-End Autonomous Driving through V2X Cooperation

Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, Zaiqing Nie

TL;DR

UniV2X tackles VICAD by formulating end-to-end planning as a cooperative task that fuses ego-vehicle and infrastructure data under bandwidth constraints. It introduces a sparse-dense hybrid transmission scheme and rotation-aware, flow-assisted cross-view fusion to jointly optimize perception, mapping, occupancy prediction, and planning, yielding a planning-focused network that remains reliable and interpretable. Empirical results on the DAIR-V2X dataset show substantial improvements in planning safety (lower collision and off-road rates) and gains across perception, mapping, and occupancy tasks, accompanied by dramatically reduced transmission costs; V2X-Sim results further corroborate the approach. The work demonstrates the practical potential of end-to-end VICAD with robust data transmission, cross-view synchronization, and unified planning, paving the way for scalable V2X-enabled autonomous driving systems.

Abstract

Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. The project is available at \href{https://github.com/AIR-THU/UniV2X}{https://github.com/AIR-THU/UniV2X}.

End-to-End Autonomous Driving through V2X Cooperation

TL;DR

UniV2X tackles VICAD by formulating end-to-end planning as a cooperative task that fuses ego-vehicle and infrastructure data under bandwidth constraints. It introduces a sparse-dense hybrid transmission scheme and rotation-aware, flow-assisted cross-view fusion to jointly optimize perception, mapping, occupancy prediction, and planning, yielding a planning-focused network that remains reliable and interpretable. Empirical results on the DAIR-V2X dataset show substantial improvements in planning safety (lower collision and off-road rates) and gains across perception, mapping, and occupancy tasks, accompanied by dramatically reduced transmission costs; V2X-Sim results further corroborate the approach. The work demonstrates the practical potential of end-to-end VICAD with robust data transmission, cross-view synchronization, and unified planning, paving the way for scalable V2X-enabled autonomous driving systems.

Abstract

Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online mapping, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. The project is available at \href{https://github.com/AIR-THU/UniV2X}{https://github.com/AIR-THU/UniV2X}.
Paper Structure (48 sections, 7 equations, 7 figures, 13 tables)

This paper contains 48 sections, 7 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: (a) VICAD: Infrastructure sensor installed highly has a broad perception field yu2022dairyu2023v2xyang2023bevheight, which can supplement the blind and long-range spots of single vehicle. (b) Performance Enhancement: Compared with No Fusion solution, UniV2X achieves significant gains in various tasks, such as detection (+13%), mapping (+11.4%), occupancy prediction (+5.7%), and collision rate (-0.5%).
  • Figure 2: Pipeline of Unified Autonomous Driving through V2X Cooperation (UniV2X). UniV2X aims to connect and jointly optimize all essential modules across diverse views for enhanced planning performance. Cross-view data interaction bolsters pivotal components in autonomous driving like agent perception, online mapping, and occupancy prediction. Additional flow prediction enables minimizing transmission costs for transmitting occupied probability map. Cross-view data fusion involves temporal and spatial synchronization, cross-view data matching and fusion, and data adaptation.
  • Figure 3: Object orientation is explicitly encoded in BEV feature maps (a) and bounding box (b), while the orientation is implicitly embedded in the feature of queries (c), resulting in the challenge of cross-view rotation alignment in spatial synchronization.
  • Figure 4: Reliability on Data Corruption.
  • Figure 5: Visualization Example: Turn Left.
  • ...and 2 more figures