4D-ROLLS: 4D Radar Occupancy Learning via LiDAR Supervision
Ruihan Liu, Xiaoyi Wu, Xijun Chen, Liang Hu, Yunjiang Lou
TL;DR
This work tackles the challenge of estimating occupancy in 3D scenes using 4D radar, which is robust in adverse weather but suffers from sparsity and noise. It introduces 4D-ROLLS, a weakly supervised framework that learns radar occupancy by leveraging LiDAR-derived pseudo-labels, including occupancy queries and LiDAR height maps, and uses a TPV-based encoding with a height-constrained loss to align radar outputs with LiDAR occupancy. A two-stage training procedure—initial LiDAR-guided learning followed by fine-tuning with LiDAR self-supervision—yields robust occupancy estimates and enables effective transfer to BEV segmentation and 3D occupancy prediction, even across datasets. The model runs at real-time speeds (≈30 Hz) on consumer-grade GPUs and demonstrates strong generalization, robustness in degraded environments, and practical downstream applicability, making it a promising all-weather perception solution for autonomous systems.
Abstract
A comprehensive understanding of 3D scenes is essential for autonomous vehicles (AVs), and among various perception tasks, occupancy estimation plays a central role by providing a general representation of drivable and occupied space. However, most existing occupancy estimation methods rely on LiDAR or cameras, which perform poorly in degraded environments such as smoke, rain, snow, and fog. In this paper, we propose 4D-ROLLS, the first weakly supervised occupancy estimation method for 4D radar using the LiDAR point cloud as the supervisory signal. Specifically, we introduce a method for generating pseudo-LiDAR labels, including occupancy queries and LiDAR height maps, as multi-stage supervision to train the 4D radar occupancy estimation model. Then the model is aligned with the occupancy map produced by LiDAR, fine-tuning its accuracy in occupancy estimation. Extensive comparative experiments validate the exceptional performance of 4D-ROLLS. Its robustness in degraded environments and effectiveness in cross-dataset training are qualitatively demonstrated. The model is also seamlessly transferred to downstream tasks BEV segmentation and point cloud occupancy prediction, highlighting its potential for broader applications. The lightweight network enables 4D-ROLLS model to achieve fast inference speeds at about 30 Hz on a 4060 GPU. The code of 4D-ROLLS will be made available at https://github.com/CLASS-Lab/4D-ROLLS.
