CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios
Hangyu Li, Bofeng Cao, Zhaohui Liang, Wuzhen Li, Juyoung Oh, Yuxuan Chen, Shixiao Liang, Hang Zhou, Chengyuan Ma, Jiaxi Liu, Zheng Li, Peng Zhang, KeKe Long, Maolin Liu, Jackson Jiang, Chunlei Yu, Shengxiang Liu, Hongkai Yu, Xiaopeng Li
TL;DR
This work tackles the scarcity of real-world V2V cooperative perception data in Complex Adverse Traffic Scenarios (CATS) by introducing CATS-V2V, a large-scale dataset collected with two hardware-synchronized vehicles across ten weather/lighting conditions and locations. It provides 60K LiDAR frames at 10 Hz, 1.26M multi-view 30 Hz camera images, and 750K RTK/IMU records, with time-consistent 3D bounding boxes and HD maps, enabling cross-vehicle BEV and mapping tasks. A target-based temporal alignment method is proposed to achieve precise cross-modal object alignment across high-frequency sensors, outperforming stamp- and frame-based approaches in qualitative and quantitative evaluations. The dataset supports a broad range of tasks (detection, tracking, localization, SLAM, depth, view synthesis, and domain adaptation) and comes with data-conversion tools, aiming to advance real-world V2V CP research under challenging conditions and drive practical CP deployments.
Abstract
Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS). Meanwhile, data serves as the fundamental infrastructure for modern autonomous driving AI. However, due to stringent data collection requirements, existing datasets focus primarily on ordinary traffic scenarios, constraining the benefits of cooperative perception. To address this challenge, we introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios. The dataset was collected by two hardware time-synchronized vehicles, covering 10 weather and lighting conditions across 10 diverse locations. The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images, along with 750K anonymized yet high-precision RTK-fixed GNSS and IMU records. Correspondingly, we provide time-consistent 3D bounding box annotations for objects, as well as static scenes to construct a 4D BEV representation. On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities. We hope that CATS-V2V, the largest-scale, most supportive, and highest-quality dataset of its kind to date, will benefit the autonomous driving community in related tasks.
