Table of Contents
Fetching ...

Diff-KD: Diffusion-based Knowledge Distillation for Collaborative Perception under Corruptions

Pengcheng Lyu, Chaokun Zhang, Gong Chen, Tao Tang, Zhaoxiang Luo

Abstract

Multi-agent collaborative perception enables autonomous systems to overcome individual sensing limits through collective intelligence. However, real-world sensor and communication corruptions severely undermine this advantage. Crucially, existing approaches treat corruptions as static perturbations or passively conform to corrupted inputs, failing to actively recover the underlying clean semantics. To address this limitation, we introduce Diff-KD, a framework that integrates diffusion-based generative refinement into teacher-student knowledge distillation for robust collaborative perception. Diff-KD features two core components: (i) Progressive Knowledge Distillation (PKD), which treats local feature restoration as a conditional diffusion process to recover global semantics from corrupted observations; and (ii) Adaptive Gated Fusion (AGF), which dynamically weights neighbors based on ego reliability during fusion. Evaluated on OPV2V and DAIR-V2X under seven corruption types, Diff-KD achieves state-of-the-art performance in both detection accuracy and calibration robustness.

Diff-KD: Diffusion-based Knowledge Distillation for Collaborative Perception under Corruptions

Abstract

Multi-agent collaborative perception enables autonomous systems to overcome individual sensing limits through collective intelligence. However, real-world sensor and communication corruptions severely undermine this advantage. Crucially, existing approaches treat corruptions as static perturbations or passively conform to corrupted inputs, failing to actively recover the underlying clean semantics. To address this limitation, we introduce Diff-KD, a framework that integrates diffusion-based generative refinement into teacher-student knowledge distillation for robust collaborative perception. Diff-KD features two core components: (i) Progressive Knowledge Distillation (PKD), which treats local feature restoration as a conditional diffusion process to recover global semantics from corrupted observations; and (ii) Adaptive Gated Fusion (AGF), which dynamically weights neighbors based on ego reliability during fusion. Evaluated on OPV2V and DAIR-V2X under seven corruption types, Diff-KD achieves state-of-the-art performance in both detection accuracy and calibration robustness.

Paper Structure

This paper contains 10 sections, 15 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overall architecture of Diff-KD. The framework comprises a teacher model (input: holistic-view point cloud) and a student model (input: multi-agent local point clouds). Knowledge transfer is achieved via Progressive Knowledge Distillation (PKD), including: (a) active feature restoration using diffusion before fusion; and (b) alignment of features and predictions after fusion. Furthermore, the student employs an (c) Adaptive Gated Fusion (AGF) module that fuses features from multiple agents, incorporating a (d) Lightweight Gated Modulation block (LGM) — the same LGM architecture is also applied in the teacher model to enhance its global features. Dashed components are active only during training.
  • Figure 2: mRCE comparison across methods on OPV2V and DAIR-V2X.
  • Figure 3: Detection performance under increasing pose noise on OPV2V. Loc and Head denote the standard deviations of localization error (in meters) and heading error (in radians), respectively.
  • Figure 4: Qualitative comparison of detection results under four realistic sensor corruption scenarios: (1) clean condition, (2) echo, (3) motion blur, and (4) cross talk. Green boxes denote ground-truth annotations, and red boxes indicate model predictions.