Table of Contents
Fetching ...

RC-GeoCP: Geometric Consensus for Radar-Camera Collaborative Perception

Xiaokai Bai, Lianqing Zheng, Runwei Guan, Siyuan Cao, Huiliang Shen

TL;DR

RC-GeoCP is introduced, the first framework to explore the fusion of 4D radar and images in CP and establishes a radar-anchored geometric consensus, establishing the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead.

Abstract

Collaborative perception (CP) enhances scene understanding through multi-agent information sharing. While LiDAR-centric systems offer precise geometry, high costs and performance degradation in adverse weather necessitate multi-modal alternatives. Despite dense visual semantics and robust spatial measurements, the synergy between cameras and 4D radar remains underexplored in collaborative settings. This work introduces RC-GeoCP, the first framework to explore the fusion of 4D radar and images in CP. To resolve misalignment caused by depth ambiguity and spatial dispersion across agents, RC-GeoCP establishes a radar-anchored geometric consensus. Specifically, Geometric Structure Rectification (GSR) aligns visual semantics with geometry derived from radar to generate spatially grounded, geometry-consistent representations. Uncertainty-Aware Communication (UAC) formulates selective transmission as a conditional entropy reduction process to prioritize informative features based on inter-agent disagreement. Finally, the Consensus-Driven Assembler (CDA) aggregates multi-agent information via shared geometric anchors to form a globally coherent representation. We establish the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead. Code will be released soon.

RC-GeoCP: Geometric Consensus for Radar-Camera Collaborative Perception

TL;DR

RC-GeoCP is introduced, the first framework to explore the fusion of 4D radar and images in CP and establishes a radar-anchored geometric consensus, establishing the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead.

Abstract

Collaborative perception (CP) enhances scene understanding through multi-agent information sharing. While LiDAR-centric systems offer precise geometry, high costs and performance degradation in adverse weather necessitate multi-modal alternatives. Despite dense visual semantics and robust spatial measurements, the synergy between cameras and 4D radar remains underexplored in collaborative settings. This work introduces RC-GeoCP, the first framework to explore the fusion of 4D radar and images in CP. To resolve misalignment caused by depth ambiguity and spatial dispersion across agents, RC-GeoCP establishes a radar-anchored geometric consensus. Specifically, Geometric Structure Rectification (GSR) aligns visual semantics with geometry derived from radar to generate spatially grounded, geometry-consistent representations. Uncertainty-Aware Communication (UAC) formulates selective transmission as a conditional entropy reduction process to prioritize informative features based on inter-agent disagreement. Finally, the Consensus-Driven Assembler (CDA) aggregates multi-agent information via shared geometric anchors to form a globally coherent representation. We establish the first unified radar-camera CP benchmark on V2X-Radar and V2X-R, demonstrating state-of-the-art performance with significantly reduced communication overhead. Code will be released soon.
Paper Structure (16 sections, 13 equations, 5 figures, 7 tables)

This paper contains 16 sections, 13 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Performance comparison on V2X-Radar (column 1,3) and V2X-R (column 2,4). As presented in column 1, 2, R+C fusion significantly outperforms single-modality baselines. Column 3, 4 show that our proposed RC-GeoCP consistently surpasses comparable radar-camera fusion methods across various collaborative perception frameworks.
  • Figure 2: Overview of RC-GeoCP. First, Geometric Structure Rectification (GSR) addresses visual feature dispersion by aligning camera-derived semantics with radar-based spatial cues. Then, Uncertainty-Aware Communication (UAC) selects informative tokens and repackages them for efficient transmission. Finally, Consensus-Driven Aggregation (CDA) combines data from multiple agents by leveraging radar-derived geometric consensus, ensuring spatially consistent multi-modal collaborative perception.
  • Figure 3: Performance-Communication comparison on the validation set of V2X-Radar V2X-RADAR (up) and V2X-R V2X-R (down) datasets, respectively. The communication cost are represented by the diameter of the blobs.
  • Figure 4: Comparision of performance with baselines on the pose error and time delay setting on the validation set of V2X-Radar V2X-RADAR and V2X-R V2X-R datasets, respectively.
  • Figure 5: Visualization results on the (a) V2X-Radar and (b) V2X-R. Each figure corresponds to a frame. We have annotated the bounding box. Zoom in for better view.