Table of Contents
Fetching ...

CoopDiff: A Diffusion-Guided Approach for Cooperation under Corruptions

Gong Chen, Chaokun Zhang, Pengcheng Lv

TL;DR

Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error, and offers a tunable balance between precision and inference efficiency.

Abstract

Cooperative perception lets agents share information to expand coverage and improve scene understanding. However, in real-world scenarios, diverse and unpredictable corruptions undermine its robustness and generalization. To address these challenges, we introduce CoopDiff, a diffusion-based cooperative perception framework that mitigates corruptions via a denoising mechanism. CoopDiff adopts a teacher-student paradigm: the Quality-Aware Teacher performs voxel-level early fusion with Quality of Interest weighting and semantic guidance, then produces clean supervision features via a diffusion denoiser. The Dual-Branch Diffusion Student first separates ego and cooperative streams in encoding to reconstruct the teacher's clean targets. And then, an Ego-Guided Cross-Attention mechanism facilitates balanced decoding under degradation by adaptively integrating ego and cooperative features. We evaluate CoopDiff on two constructed multi-degradation benchmarks, OPV2Vn and DAIR-V2Xn, each incorporating six corruption types, including environmental and sensor-level distortions. Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error. Furthermore, it offers a tunable balance between precision and inference efficiency.

CoopDiff: A Diffusion-Guided Approach for Cooperation under Corruptions

TL;DR

Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error, and offers a tunable balance between precision and inference efficiency.

Abstract

Cooperative perception lets agents share information to expand coverage and improve scene understanding. However, in real-world scenarios, diverse and unpredictable corruptions undermine its robustness and generalization. To address these challenges, we introduce CoopDiff, a diffusion-based cooperative perception framework that mitigates corruptions via a denoising mechanism. CoopDiff adopts a teacher-student paradigm: the Quality-Aware Teacher performs voxel-level early fusion with Quality of Interest weighting and semantic guidance, then produces clean supervision features via a diffusion denoiser. The Dual-Branch Diffusion Student first separates ego and cooperative streams in encoding to reconstruct the teacher's clean targets. And then, an Ego-Guided Cross-Attention mechanism facilitates balanced decoding under degradation by adaptively integrating ego and cooperative features. We evaluate CoopDiff on two constructed multi-degradation benchmarks, OPV2Vn and DAIR-V2Xn, each incorporating six corruption types, including environmental and sensor-level distortions. Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error. Furthermore, it offers a tunable balance between precision and inference efficiency.
Paper Structure (12 sections, 14 equations, 7 figures, 8 tables)

This paper contains 12 sections, 14 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Overall performance comparison of average results on eight state-of-art methods across six corruption types.
  • Figure 2: Overview of the proposed CoopDiff, which employs a Teacher-Student paradigm. The Quality-Aware Teacher model $\mathcal{D}_{\Psi}^{\text{tea}}$ uses an early-fusion strategy to process multi-agent inputs and generate a clean target feature map. The Dual-Branch Denoising Student $\mathcal{D}_{\theta}^{\text{stu}}$ is trained to reconstruct the target by leveraging local and cooperative conditions.
  • Figure 3: Overview of the Quality-Aware Early Fusion Teacher. Multi-agent features are first fused via Quality of Interest (QoI) weighting and semantic guidance, after which the GCM-based diffusion network denoises the input. The right shows the architecture of GCM.
  • Figure 4: Architecture of the Cooperative Deformable Attention (CDA) module.
  • Figure 5: Robustness of our proposed method and compared benchmark to varying levels of corruption.
  • ...and 2 more figures