V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality
Hao Xiang, Zhaoliang Zheng, Xin Xia, Seth Z. Zhao, Letian Gao, Zewei Zhou, Tianhui Cai, Yun Zhang, Jiaqi Ma
TL;DR
V2X-ReaLO tackles the gap between offline/simulation studies and real-world online cooperative perception by delivering an open ROS-based framework that supports early, late, and intermediate fusion, and by introducing an online benchmark dataset derived from V2X-Real with dynamic, synchronized ROS bags. The framework enables real-time transmission, synchronization, and fusion of intermediate neural features (via a Transmission Encoder/Decoder and Feature Bank) under realistic bandwidth and latency constraints, demonstrating the feasibility of online intermediate fusion in urban deployments. Comprehensive online experiments across V2V, V2I, and I2I modes show that intermediate fusion methods can outperform traditional fusion approaches in real-world conditions, though performance degrades for small or highly mobile objects and when computational overhead is high. The online dataset and benchmarks facilitate real-time evaluation of perception accuracy and communication latency, lowering barriers to practicing online cooperative perception research and accelerating progress toward deployable V2X systems.
Abstract
Cooperative perception enabled by Vehicle-to-Everything (V2X) communication holds significant promise for enhancing the perception capabilities of autonomous vehicles, allowing them to overcome occlusions and extend their field of view. However, existing research predominantly relies on simulated environments or static datasets, leaving the feasibility and effectiveness of V2X cooperative perception especially for intermediate fusion in real-world scenarios largely unexplored. In this work, we introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure that integrates early, late, and intermediate fusion methods within a unified pipeline and provides the first practical demonstration of online intermediate fusion's feasibility and performance under genuine real-world conditions. Additionally, we present an open benchmark dataset specifically designed to assess the performance of online cooperative perception systems. This new dataset extends V2X-Real dataset to dynamic, synchronized ROS bags and provides 25,028 test frames with 6,850 annotated key frames in challenging urban scenarios. By enabling real-time assessments of perception accuracy and communication lantency under dynamic conditions, V2X-ReaLO sets a new benchmark for advancing and optimizing cooperative perception systems in real-world applications. The codes and datasets will be released to further advance the field.
