OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries
Yuhang Lu, Xinge Zhu, Tai Wang, Yuexin Ma
TL;DR
This paper tackles the inefficiency of dense voxel-based 3D occupancy prediction by introducing OctreeOcc, a multi-granularity octree-based framework that adaptively partitions space to match object sizes and scene details. It combines semantic-guided initialization with an iterative structure rectification mechanism, and employs deformable attention-based octree encoding to fuse temporal and multi-view features. OctreeOcc achieves state-of-the-art results on Occ3D-nuScenes and SemanticKITTI while reducing computational overhead by 15–24% compared to dense-grid methods. The work demonstrates the practicality of learning octree structures from images for 3D occupancy tasks and provides thorough ablations on initialization, rectification, and depth of the octree. Overall, the approach offers a scalable, accurate solution for holistic 3D scene understanding in autonomous systems.
Abstract
Occupancy prediction has increasingly garnered attention in recent years for its fine-grained understanding of 3D scenes. Traditional approaches typically rely on dense, regular grid representations, which often leads to excessive computational demands and a loss of spatial details for small objects. This paper introduces OctreeOcc, an innovative 3D occupancy prediction framework that leverages the octree representation to adaptively capture valuable information in 3D, offering variable granularity to accommodate object shapes and semantic regions of varying sizes and complexities. In particular, we incorporate image semantic information to improve the accuracy of initial octree structures and design an effective rectification mechanism to refine the octree structure iteratively. Our extensive evaluations show that OctreeOcc not only surpasses state-of-the-art methods in occupancy prediction, but also achieves a 15%-24% reduction in computational overhead compared to dense-grid-based methods.
