Table of Contents
Fetching ...

SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection

Ruoyu Xu, Zhiyu Xiang, Chenwei Zhang, Hanzhi Zhong, Xijun Zhao, Ruina Dang, Peng Xu, Tianyu Pu, Eryun Liu

TL;DR

This work tackles the challenge of 4D radar-based 3D object detection by proposing SCKD, a semi-supervised cross-modality knowledge distillation framework that uses a Lidar–Radar bi-modality fusion teacher to guide a radar-only student. The method introduces an adaptive fusion-based teacher, two feature-distillation paths (LRFD and FRFD), and semi-supervised output distillation (SSOD), enabling substantial performance gains without relying on ground-truth labels for training the student. Experiments on VoD and ZJUODset show that SCKD significantly improves radar-only detection, outperforming state-of-the-art radar-based methods and approaching fusion-based performance while preserving real-time efficiency, especially when unlabeled data are available. The results demonstrate the practical value of leveraging unlabeled data and modality-aligned distillation to enhance radar robustness under adverse weather and long-range detection scenarios.

Abstract

3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the existing methods is still much lower than expected. In this paper, we propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection. It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation. We first propose an adaptive fusion module in the teacher network to boost its performance. Then, two feature distillation modules are designed to facilitate the cross-modality knowledge transfer. Finally, a semi-supervised output distillation is proposed to increase the effectiveness and flexibility of the distillation framework. With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline and outperforms the state-of-the-art works on the VoD dataset. The experiment on ZJUODset also shows 5.12% mAP improvements on the moderate difficulty level over the baseline when extra unlabeled data are available. Code is available at https://github.com/Ruoyu-Xu/SCKD.

SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection

TL;DR

This work tackles the challenge of 4D radar-based 3D object detection by proposing SCKD, a semi-supervised cross-modality knowledge distillation framework that uses a Lidar–Radar bi-modality fusion teacher to guide a radar-only student. The method introduces an adaptive fusion-based teacher, two feature-distillation paths (LRFD and FRFD), and semi-supervised output distillation (SSOD), enabling substantial performance gains without relying on ground-truth labels for training the student. Experiments on VoD and ZJUODset show that SCKD significantly improves radar-only detection, outperforming state-of-the-art radar-based methods and approaching fusion-based performance while preserving real-time efficiency, especially when unlabeled data are available. The results demonstrate the practical value of leveraging unlabeled data and modality-aligned distillation to enhance radar robustness under adverse weather and long-range detection scenarios.

Abstract

3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the existing methods is still much lower than expected. In this paper, we propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection. It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation. We first propose an adaptive fusion module in the teacher network to boost its performance. Then, two feature distillation modules are designed to facilitate the cross-modality knowledge transfer. Finally, a semi-supervised output distillation is proposed to increase the effectiveness and flexibility of the distillation framework. With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline and outperforms the state-of-the-art works on the VoD dataset. The experiment on ZJUODset also shows 5.12% mAP improvements on the moderate difficulty level over the baseline when extra unlabeled data are available. Code is available at https://github.com/Ruoyu-Xu/SCKD.

Paper Structure

This paper contains 28 sections, 15 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of the current mainstream cross-modality knowledge distillation approaches (a): BEVDistillchen2022bevdistill,MonoDistillchong2022monodistill and (b): DistillBEVwang2023distillbev, UniDistillzhou2023unidistill, RadarDistillbang2024radardistill with our SCKD (c).
  • Figure 2: Overview of our SCKD Framework. The solid and dashed lines represent the data flow and calculation of the distillation loss respectively. In the inference stage, only the student network is involved.
  • Figure 3: The structure of the Adaptive Fusion module.
  • Figure 4: Comparison of the feature maps. Column (a) shows the input radar point clouds of two scenes annotated with ground truth, while (b) and (c) show the corresponding heatmaps obtained by SECOND and our method, respectively. The grey and red ellipses separately mark out the backgrounds and foregrounds for comparison.
  • Figure 5: Qualitative results on the VoD dataset. (a) shows the scene image, while (b) and (c) show the detection results of radar-based SECOND and our method SCKD, respectively. Lidar and radar points are marked with orange and green, while the predicted and ground truth bounding boxes are colored with red and blue, respectively.