SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection
Ruoyu Xu, Zhiyu Xiang, Chenwei Zhang, Hanzhi Zhong, Xijun Zhao, Ruina Dang, Peng Xu, Tianyu Pu, Eryun Liu
TL;DR
This work tackles the challenge of 4D radar-based 3D object detection by proposing SCKD, a semi-supervised cross-modality knowledge distillation framework that uses a Lidar–Radar bi-modality fusion teacher to guide a radar-only student. The method introduces an adaptive fusion-based teacher, two feature-distillation paths (LRFD and FRFD), and semi-supervised output distillation (SSOD), enabling substantial performance gains without relying on ground-truth labels for training the student. Experiments on VoD and ZJUODset show that SCKD significantly improves radar-only detection, outperforming state-of-the-art radar-based methods and approaching fusion-based performance while preserving real-time efficiency, especially when unlabeled data are available. The results demonstrate the practical value of leveraging unlabeled data and modality-aligned distillation to enhance radar robustness under adverse weather and long-range detection scenarios.
Abstract
3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the existing methods is still much lower than expected. In this paper, we propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection. It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation. We first propose an adaptive fusion module in the teacher network to boost its performance. Then, two feature distillation modules are designed to facilitate the cross-modality knowledge transfer. Finally, a semi-supervised output distillation is proposed to increase the effectiveness and flexibility of the distillation framework. With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline and outperforms the state-of-the-art works on the VoD dataset. The experiment on ZJUODset also shows 5.12% mAP improvements on the moderate difficulty level over the baseline when extra unlabeled data are available. Code is available at https://github.com/Ruoyu-Xu/SCKD.
