Table of Contents
Fetching ...

CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios

Hangyu Li, Bofeng Cao, Zhaohui Liang, Wuzhen Li, Juyoung Oh, Yuxuan Chen, Shixiao Liang, Hang Zhou, Chengyuan Ma, Jiaxi Liu, Zheng Li, Peng Zhang, KeKe Long, Maolin Liu, Jackson Jiang, Chunlei Yu, Shengxiang Liu, Hongkai Yu, Xiaopeng Li

TL;DR

This work tackles the scarcity of real-world V2V cooperative perception data in Complex Adverse Traffic Scenarios (CATS) by introducing CATS-V2V, a large-scale dataset collected with two hardware-synchronized vehicles across ten weather/lighting conditions and locations. It provides 60K LiDAR frames at 10 Hz, 1.26M multi-view 30 Hz camera images, and 750K RTK/IMU records, with time-consistent 3D bounding boxes and HD maps, enabling cross-vehicle BEV and mapping tasks. A target-based temporal alignment method is proposed to achieve precise cross-modal object alignment across high-frequency sensors, outperforming stamp- and frame-based approaches in qualitative and quantitative evaluations. The dataset supports a broad range of tasks (detection, tracking, localization, SLAM, depth, view synthesis, and domain adaptation) and comes with data-conversion tools, aiming to advance real-world V2V CP research under challenging conditions and drive practical CP deployments.

Abstract

Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS). Meanwhile, data serves as the fundamental infrastructure for modern autonomous driving AI. However, due to stringent data collection requirements, existing datasets focus primarily on ordinary traffic scenarios, constraining the benefits of cooperative perception. To address this challenge, we introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios. The dataset was collected by two hardware time-synchronized vehicles, covering 10 weather and lighting conditions across 10 diverse locations. The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images, along with 750K anonymized yet high-precision RTK-fixed GNSS and IMU records. Correspondingly, we provide time-consistent 3D bounding box annotations for objects, as well as static scenes to construct a 4D BEV representation. On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities. We hope that CATS-V2V, the largest-scale, most supportive, and highest-quality dataset of its kind to date, will benefit the autonomous driving community in related tasks.

CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios

TL;DR

This work tackles the scarcity of real-world V2V cooperative perception data in Complex Adverse Traffic Scenarios (CATS) by introducing CATS-V2V, a large-scale dataset collected with two hardware-synchronized vehicles across ten weather/lighting conditions and locations. It provides 60K LiDAR frames at 10 Hz, 1.26M multi-view 30 Hz camera images, and 750K RTK/IMU records, with time-consistent 3D bounding boxes and HD maps, enabling cross-vehicle BEV and mapping tasks. A target-based temporal alignment method is proposed to achieve precise cross-modal object alignment across high-frequency sensors, outperforming stamp- and frame-based approaches in qualitative and quantitative evaluations. The dataset supports a broad range of tasks (detection, tracking, localization, SLAM, depth, view synthesis, and domain adaptation) and comes with data-conversion tools, aiming to advance real-world V2V CP research under challenging conditions and drive practical CP deployments.

Abstract

Vehicle-to-Vehicle (V2V) cooperative perception has great potential to enhance autonomous driving performance by overcoming perception limitations in complex adverse traffic scenarios (CATS). Meanwhile, data serves as the fundamental infrastructure for modern autonomous driving AI. However, due to stringent data collection requirements, existing datasets focus primarily on ordinary traffic scenarios, constraining the benefits of cooperative perception. To address this challenge, we introduce CATS-V2V, the first-of-its-kind real-world dataset for V2V cooperative perception under complex adverse traffic scenarios. The dataset was collected by two hardware time-synchronized vehicles, covering 10 weather and lighting conditions across 10 diverse locations. The 100-clip dataset includes 60K frames of 10 Hz LiDAR point clouds and 1.26M multi-view 30 Hz camera images, along with 750K anonymized yet high-precision RTK-fixed GNSS and IMU records. Correspondingly, we provide time-consistent 3D bounding box annotations for objects, as well as static scenes to construct a 4D BEV representation. On this basis, we propose a target-based temporal alignment method, ensuring that all objects are precisely aligned across all sensor modalities. We hope that CATS-V2V, the largest-scale, most supportive, and highest-quality dataset of its kind to date, will benefit the autonomous driving community in related tasks.

Paper Structure

This paper contains 20 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: One frame of CATS-V2V dataset. The middle-upper shows the combined point cloud and 3D bounding box annotations, while the middle-lower presents the HD map and BEV annotations. On sides are the seven camera views of each vehicle with projected annotations.
  • Figure 2: 10 CATS scenarios at 10 locations: (a) Clear day at an arterial road; (b) Rainy day at a highway; (c) Snowy day at a highway; (d) Dawn at an arterial road; (e) Dusk with direct sunlight at an arterial road; (f) Clear night at a collector road; (g) Rainy night at ab arterial road; (h) Snwoy night at an arterial road; (i) Foggy day at a local street; (j) Overcast day with work zone at an arterial road.
  • Figure 3: Sensor configurations of our two data-collecting vehicles.
  • Figure 4: Time synchronization topology for all sensors and module controllers of both vehicles.
  • Figure 5: Comparison between two deskewed point clouds register (a) with initial estimation; (b) after GICP refinement.
  • ...and 1 more figures