Table of Contents
Fetching ...

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

TL;DR

This work tackles the need for area-coverage roadside perception by introducing RCooper, the first real-world, large-scale dataset for roadside cooperative perception across intersection and corridor scenes. It provides 50k images and 30k LiDAR point clouds with 3D bounding boxes and trajectories for ten classes, enabling two core tasks: 3D cooperative detection and 3D cooperative tracking. The authors establish benchmarks using multiple fusion strategies (No, Late, Early, and Intermediate) and demonstrate the benefits and challenges of cross-infrastructure cooperation, particularly data heterogeneity from mixed LiDAR types. The dataset and benchmarks aim to advance practical, infrastructure-based perception for autonomous driving and traffic management, with public code and data accessible to spur future research on unified roadside representations and end-to-end perception pipelines.

Abstract

The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper.

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

TL;DR

This work tackles the need for area-coverage roadside perception by introducing RCooper, the first real-world, large-scale dataset for roadside cooperative perception across intersection and corridor scenes. It provides 50k images and 30k LiDAR point clouds with 3D bounding boxes and trajectories for ten classes, enabling two core tasks: 3D cooperative detection and 3D cooperative tracking. The authors establish benchmarks using multiple fusion strategies (No, Late, Early, and Intermediate) and demonstrate the benefits and challenges of cross-infrastructure cooperation, particularly data heterogeneity from mixed LiDAR types. The dataset and benchmarks aim to advance practical, infrastructure-based perception for autonomous driving and traffic management, with public code and data accessible to spur future research on unified roadside representations and end-to-end perception pipelines.

Abstract

The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper.
Paper Structure (38 sections, 9 figures, 7 tables)

This paper contains 38 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Roadside Cooperative Perception (RCooper) is expected to achieve practical area-coverage roadside perception for restricted traffic areas, which would further promote both the autonomous driving and traffic management. The complex roadside system is boiled down to two typical roadside settings, i.e., (a) intersection RCooper scenes and (b) corridor RCooper scenes.
  • Figure 2: Independent roadside 3D perception (red point clouds) is limited by sensing range and blind spots. (a) The infrastructure-side cooperation can effectively extend the sensing range to cover the whole corridor scene, and the observation from multiple views can weaken the impact of occlusion in the complex intersection scene. (b) The area under the infrastructure is the camera's blind spot, which is perceptible from the adjacent infrastructure's camera.
  • Figure 3: RCooper scenes can be modeled as the structural-stable graph, where the infrastructures and connections are regarded as nodes and edges. Two basic units in graphics, line segment and loop, correspond to the corridor and intersection in actual scenes.
  • Figure 4: Diagram of the infrastructure-side sensor system. For intersection scenes, we adopt a hybrid scheme for LiDAR-based systems where 2 LiDARs groups (80-beams + 32-beams) and 2 MEMS LiDAR are utilized, because the integration of multiline and MEMS LiDAR is gaining traction for its cost-effectiveness compared to using multiline LiDAR alone. For corridor scenes, each sensor agent include a LiDARs group to cover the region.
  • Figure 5: The distribution of semantic classes.
  • ...and 4 more figures