Table of Contents
Fetching ...

IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion

Shashank Mishra, Karan Patil, Didier Stricker, Jason Rambach

TL;DR

IMKD tackles the challenge of cost-efficient 3D object detection with camera–radar fusion by introducing intensity-aware multi-level knowledge distillation. It uses LiDAR as a privileged guide and performs distillation at multiple stages, including merged BEV features and intensity-guided fusion, to preserve sensor-specific strengths while amplifying complementarities. The approach shows state-of-the-art performance among KD-based camera–radar methods on nuScenes, with strong robustness to weather and temporal degradation. This work demonstrates that reliability-guided supervision in a fused space can substantially improve cross-modal perception without at-inference LiDAR, offering practical benefits for scalable autonomous systems.

Abstract

High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time. However, existing distillation methods typically transfer modality-specific features directly to each sensor, which can distort their unique characteristics and degrade their individual strengths. To address this, we introduce IMKD, a radar-camera fusion framework based on multi-level knowledge distillation that preserves each sensor's intrinsic characteristics while amplifying their complementary strengths. IMKD applies a three-stage, intensity-aware distillation strategy to enrich the fused representation across the architecture: (1) LiDAR-to-Radar intensity-aware feature distillation to enhance radar representations with fine-grained structural cues, (2) LiDAR-to-Fused feature intensity-guided distillation to selectively highlight useful geometry and depth information at the fusion level, fostering complementarity between the modalities rather than forcing them to align, and (3) Camera-Radar intensity-guided fusion mechanism that facilitates effective feature alignment and calibration. Extensive experiments on the nuScenes benchmark show that IMKD reaches 67.0% NDS and 61.0% mAP, outperforming all prior distillation-based radar-camera fusion methods. Our code and models are available at https://github.com/dfki-av/IMKD/.

IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion

TL;DR

IMKD tackles the challenge of cost-efficient 3D object detection with camera–radar fusion by introducing intensity-aware multi-level knowledge distillation. It uses LiDAR as a privileged guide and performs distillation at multiple stages, including merged BEV features and intensity-guided fusion, to preserve sensor-specific strengths while amplifying complementarities. The approach shows state-of-the-art performance among KD-based camera–radar methods on nuScenes, with strong robustness to weather and temporal degradation. This work demonstrates that reliability-guided supervision in a fused space can substantially improve cross-modal perception without at-inference LiDAR, offering practical benefits for scalable autonomous systems.

Abstract

High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time. However, existing distillation methods typically transfer modality-specific features directly to each sensor, which can distort their unique characteristics and degrade their individual strengths. To address this, we introduce IMKD, a radar-camera fusion framework based on multi-level knowledge distillation that preserves each sensor's intrinsic characteristics while amplifying their complementary strengths. IMKD applies a three-stage, intensity-aware distillation strategy to enrich the fused representation across the architecture: (1) LiDAR-to-Radar intensity-aware feature distillation to enhance radar representations with fine-grained structural cues, (2) LiDAR-to-Fused feature intensity-guided distillation to selectively highlight useful geometry and depth information at the fusion level, fostering complementarity between the modalities rather than forcing them to align, and (3) Camera-Radar intensity-guided fusion mechanism that facilitates effective feature alignment and calibration. Extensive experiments on the nuScenes benchmark show that IMKD reaches 67.0% NDS and 61.0% mAP, outperforming all prior distillation-based radar-camera fusion methods. Our code and models are available at https://github.com/dfki-av/IMKD/.

Paper Structure

This paper contains 48 sections, 20 equations, 8 figures, 18 tables.

Figures (8)

  • Figure 1: Comparison of KD methods grouped by distillation target. IMKD’s intensity-based knowledge distillation achieves the highest performance.
  • Figure 2: Overview of the proposed Intensity-Aware Multi-Level Knowledge Distillation.
  • Figure 3: We illustrate a weighted LiDAR feature map, generated from intensity, is merged with the radar feature map to compute the intensity-guided feature map loss, preserving radar features and avoiding low-intensity LiDAR regions.
  • Figure 4: Comparison of distillation targets: individual modality KD yields extra false detections (white circles) and poor orientation, while merged feature KD aligns better with ground truth.
  • Figure 5: Sensitivity of mAP and NDS to individual loss weights $\lambda$. Each subplot reports an illustrative sweep over $\lambda \in (0.1,0.2,0.3,0.4,0.5,0.6)$; dashed vertical lines mark the chosen operating points ($\lambda=0.3$ for most terms, $\alpha=0.5$ for alignment). The curves indicate that performance is stable near the chosen weights and degrades when weights deviate substantially.
  • ...and 3 more figures