SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail Voxels
Qiucheng Yu, Yuan Xie, Xin Tan
TL;DR
The paper tackles inefficiencies and bias in vision-based 3D occupancy prediction by identifying inter-class long-tail and geometric distribution patterns in voxel space. It introduces SHTOcc, which combines sparse head-tail voxel construction with attention-guided head voxel selection and robust tail voxel sampling, plus a decoupled decoder with label smoothing to reduce head-class bias and boost tail-class accuracy. Empirical results across SemanticKITTI SSC, nuScenes-Occupancy, Occ3D-nuScenes, and LiDAR segmentation show substantial memory and latency reductions (e.g., up to ~58.6% faster inference and ~42.2% memory savings) and consistent mIoU gains (~0.2–0.7 points) when integrating SHTOcc with popular backbones. The approach is plug-and-play and offers practical improvements for real-time 3D perception in autonomous driving.
Abstract
3D occupancy prediction has attracted much attention in the field of autonomous driving due to its powerful geometric perception and object recognition capabilities. However, existing methods have not explored the most essential distribution patterns of voxels, resulting in unsatisfactory results. This paper first explores the inter-class distribution and geometric distribution of voxels, thereby solving the long-tail problem caused by the inter-class distribution and the poor performance caused by the geometric distribution. Specifically, this paper proposes SHTOcc (Sparse Head-Tail Occupancy), which uses sparse head-tail voxel construction to accurately identify and balance key voxels in the head and tail classes, while using decoupled learning to reduce the model's bias towards the dominant (head) category and enhance the focus on the tail class. Experiments show that significant improvements have been made on multiple baselines: SHTOcc reduces GPU memory usage by 42.2%, increases inference speed by 58.6%, and improves accuracy by about 7%, verifying its effectiveness and efficiency. The code is available at https://github.com/ge95net/SHTOcc
