Table of Contents
Fetching ...

Rethinking the Role of Infrastructure in Collaborative Perception

Hyunchul Bae, Minhee Kang, Minwoo Song, Heejin Ahn

TL;DR

This work rethinks infrastructure within collaborative perception by quantitatively comparing vehicle-centric CP with infra-centric CP and demonstrating that infrastructure data can significantly boost 3D object detection accuracy. It systematically analyzes metadata-sharing and intermediate fusion architectures, evaluating across V2XSet and V2X-Sim datasets with multiple models (V2X-ViT, Where2comm, ParCon) and varying noise conditions. The study finds that infrastructure-enhanced CP improves detection (up to $10.30\%$ in perfect conditions and notable gains under noise) and that infra-centric CP offers superior noise robustness and high accuracy in structured scenarios like intersections (up to $46.47\%$ advantage in certain comparisons). The authors advocate a context-dependent CP strategy that leverages the strengths of both ego vehicle and infrastructure, while outlining limitations and future work toward Infra-to-Infra (I2I) and multi-infrastructure integration to broaden applicability.

Abstract

Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quantitatively assess the importance of infrastructure data in existing vehicle-centric CP, where the ego agent is a vehicle. Furthermore, we compare vehicle-centric CP with infra-centric CP, where the ego agent is now the infrastructure, to evaluate the effectiveness of each approach. Our results demonstrate that incorporating infrastructure data improves 3D detection accuracy by up to 10.30%, and infra-centric CP shows enhanced noise robustness and increases accuracy by up to 46.47% compared with vehicle-centric CP.

Rethinking the Role of Infrastructure in Collaborative Perception

TL;DR

This work rethinks infrastructure within collaborative perception by quantitatively comparing vehicle-centric CP with infra-centric CP and demonstrating that infrastructure data can significantly boost 3D object detection accuracy. It systematically analyzes metadata-sharing and intermediate fusion architectures, evaluating across V2XSet and V2X-Sim datasets with multiple models (V2X-ViT, Where2comm, ParCon) and varying noise conditions. The study finds that infrastructure-enhanced CP improves detection (up to in perfect conditions and notable gains under noise) and that infra-centric CP offers superior noise robustness and high accuracy in structured scenarios like intersections (up to advantage in certain comparisons). The authors advocate a context-dependent CP strategy that leverages the strengths of both ego vehicle and infrastructure, while outlining limitations and future work toward Infra-to-Infra (I2I) and multi-infrastructure integration to broaden applicability.

Abstract

Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quantitatively assess the importance of infrastructure data in existing vehicle-centric CP, where the ego agent is a vehicle. Furthermore, we compare vehicle-centric CP with infra-centric CP, where the ego agent is now the infrastructure, to evaluate the effectiveness of each approach. Our results demonstrate that incorporating infrastructure data improves 3D detection accuracy by up to 10.30%, and infra-centric CP shows enhanced noise robustness and increases accuracy by up to 46.47% compared with vehicle-centric CP.

Paper Structure

This paper contains 28 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: About this paper. We present quantitative analyses of the importance of infrastructure data in (a) vehicle-centric CP and also in (b) infra-centric CP.
  • Figure 2: Intermediate fusion of collaborative perception. We overview the most representative CP structure, consisting of metadata sharing, feature extraction, compress and sharing, feature fusion, and detection head.
  • Figure 3: Effective case of infrastructure data (Scene #4). The point clouds from the vehicle's LiDAR are indicated as white dots, and the point clouds from the infrastructure's LiDAR are indicated as yellow dots. The green bounding boxes are ground truth objects within the ego agent's detection range. E means the ego agent and A means an aux agent.
  • Figure 4: Uneffective case of infrastructure data (Scene #3). The point clouds from the vehicle's LiDAR are indicated as white dots, and the point clouds from the infrastructure's LiDAR are indicated as yellow dots. The green bounding boxes are ground truth objects within the ego agent's detection range. E means the ego agent and A means an aux agent.
  • Figure 5: Comparison between Shapes of the Detection Range of the Ego Agent. The point clouds from the vehicle's LiDAR are indicated as white dots, and the point clouds from the infrastructure's LiDAR are indicated as yellow dots. The green bounding boxes are ground truth objects in the ego agent's detection range. E means the ego agent and A means aux agents. The white dotted line indicates the rectangle-shaped detection range, and the yellow dotted line indicates the square-shaped detection range.
  • ...and 4 more figures