LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera
Yukai Ma, Jianbiao Mei, Xuemeng Yang, Licheng Wen, Weihua Xu, Jiangning Zhang, Botian Shi, Yong Liu, Xingxing Zuo
TL;DR
The paper tackles semantic scene completion (SSC) under adverse weather by leveraging radar, proposing LiCROcc—a BEV-based fusion framework with a radar-based student and a LiDAR-camera teacher. It introduces fusion-based cross-modal knowledge distillation comprising Cross-Model Residual Distillation, BEV Relation Distillation, and Predictive Distribution Distillation to transfer rich semantic and geometric cues from LiDAR-camera fusion to radar. The approach yields substantial gains on nuScenes-Occupancy, with radar-only and radar-camera variants approaching LiDAR-camera performance and demonstrating robustness to weather and lighting. This work advances practical, weather-resilient SSC for autonomous driving and establishes radar-centered benchmarks and KD strategies for multi-modal 3D perception.
Abstract
Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensing alternative. In this paper, we focus on the potential of 3D radar in semantic scene completion, pioneering cross-modal refinement techniques for improved robustness against weather and illumination changes, and enhancing SSC performance.Regarding model architecture, we propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images. Based on this foundation, we designed three cross-modal distillation modules-CMRD, BRD, and PDD. Our approach enhances the performance in both radar-only (R-LiCROcc) and radar-camera (RC-LiCROcc) settings by distilling to them the rich semantic and structural information of the fused features of LiDAR and camera. Finally, our LC-Fusion (teacher model), R-LiCROcc and RC-LiCROcc achieve the best performance on the nuScenes-Occupancy dataset, with mIOU exceeding the baseline by 22.9%, 44.1%, and 15.5%, respectively. The project page is available at https://hr-zju.github.io/LiCROcc/.
