Table of Contents
Fetching ...

V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality

Hao Xiang, Zhaoliang Zheng, Xin Xia, Seth Z. Zhao, Letian Gao, Zewei Zhou, Tianhui Cai, Yun Zhang, Jiaqi Ma

TL;DR

V2X-ReaLO tackles the gap between offline/simulation studies and real-world online cooperative perception by delivering an open ROS-based framework that supports early, late, and intermediate fusion, and by introducing an online benchmark dataset derived from V2X-Real with dynamic, synchronized ROS bags. The framework enables real-time transmission, synchronization, and fusion of intermediate neural features (via a Transmission Encoder/Decoder and Feature Bank) under realistic bandwidth and latency constraints, demonstrating the feasibility of online intermediate fusion in urban deployments. Comprehensive online experiments across V2V, V2I, and I2I modes show that intermediate fusion methods can outperform traditional fusion approaches in real-world conditions, though performance degrades for small or highly mobile objects and when computational overhead is high. The online dataset and benchmarks facilitate real-time evaluation of perception accuracy and communication latency, lowering barriers to practicing online cooperative perception research and accelerating progress toward deployable V2X systems.

Abstract

Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative perception especially for intermediate fusion in real-world scenarios largely unexplored. In this work, we introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure that integrates early, late, and intermediate fusion methods within a unified pipeline and provides the first practical demonstration of online intermediate fusion's feasibility and performance under genuine real-world conditions. Additionally, we present an open benchmark dataset specifically designed to assess the performance of online cooperative perception systems. This new dataset extends V2X-Real dataset to dynamic, synchronized ROS bags and provides 25,028 test frames with 6,850 annotated key frames in challenging urban scenarios. By enabling real-time assessments of perception accuracy and communication lantency under dynamic conditions, V2X-ReaLO sets a new benchmark for advancing and optimizing cooperative perception systems in real-world applications. The codes and datasets will be released to further advance the field.

V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality

TL;DR

V2X-ReaLO tackles the gap between offline/simulation studies and real-world online cooperative perception by delivering an open ROS-based framework that supports early, late, and intermediate fusion, and by introducing an online benchmark dataset derived from V2X-Real with dynamic, synchronized ROS bags. The framework enables real-time transmission, synchronization, and fusion of intermediate neural features (via a Transmission Encoder/Decoder and Feature Bank) under realistic bandwidth and latency constraints, demonstrating the feasibility of online intermediate fusion in urban deployments. Comprehensive online experiments across V2V, V2I, and I2I modes show that intermediate fusion methods can outperform traditional fusion approaches in real-world conditions, though performance degrades for small or highly mobile objects and when computational overhead is high. The online dataset and benchmarks facilitate real-time evaluation of perception accuracy and communication latency, lowering barriers to practicing online cooperative perception research and accelerating progress toward deployable V2X systems.

Abstract

Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative perception especially for intermediate fusion in real-world scenarios largely unexplored. In this work, we introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure that integrates early, late, and intermediate fusion methods within a unified pipeline and provides the first practical demonstration of online intermediate fusion's feasibility and performance under genuine real-world conditions. Additionally, we present an open benchmark dataset specifically designed to assess the performance of online cooperative perception systems. This new dataset extends V2X-Real dataset to dynamic, synchronized ROS bags and provides 25,028 test frames with 6,850 annotated key frames in challenging urban scenarios. By enabling real-time assessments of perception accuracy and communication lantency under dynamic conditions, V2X-ReaLO sets a new benchmark for advancing and optimizing cooperative perception systems in real-world applications. The codes and datasets will be released to further advance the field.

Paper Structure

This paper contains 15 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Online testing system for real-time cooperative perception. (a) Smart intersection. (b) Roadside unit equipped with Ouster OS1-128/64 LiDAR and Garmin GPS. (c) Connected vehicle retrofitted with a 128-channel Robosense LiDAR and Gongji GPS. Wireless V2X communication employs Wi-Fi network..
  • Figure 2: Framework of online intermediate fusion. Each agent processes LiDAR point clouds to generate Bird’s Eye View (BEV) features, which are compressed by a Transmission Encoder and transmitted over the Wi-Fi network as ROS messages. On the receiving side, these features are reconstructed via Transmission Decoder, stored in a Feature Bank, and retrieved by the ego agent for fusion with its own BEV representation. Finally, the fused features are passed to a Detection Head for perception outputs. The entire pipeline operates in ROS, with additional support modules (e.g., time synchronization, online localization) ensuring seamless multi-agent coordination.
  • Figure 3: Pipeline of transmission modules. (a) Transmission Encoder compresses the BEV feature, transfers it from the GPU to the host, serializes the data, and packages it into a ROS message. (b) Transmission Decoder performs the reverse operations by parsing the ROS message, deserializing and transferring the data back to the GPU, and finally decompressing the feature.
  • Figure 4: Qualitative results for Intermediate Fusion, deployed and running in real time on an in-vehicle computing platform in collaboration with a RSU.