Fast Occupancy Network
Mingjie Lu, Yuanxian Huang, Ji Liu, Xingliang Huang, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum
TL;DR
The paper tackles the high computational cost of 3D occupancy networks for autonomous driving by turning a BEV-based representation into a 3D voxel prediction using a lightweight deformable 2D convolution for lifting. It introduces a Partial Voxel FPN to efficiently fuse multi-scale voxel features and adds a cost-free perspective-view segmentation branch to bolster accuracy during training. Empirical results on OpenOcc and SemanticKITTI show the approach achieves state-of-the-art or near-state-of-the-art accuracy while delivering up to roughly 3× faster inference on common backbones, and it remains easily transferable to existing BEV models. This work offers a practical pathway to deploy accurate 3D occupancy in real-time perception pipelines for autonomous driving.
Abstract
Occupancy Network has recently attracted much attention in autonomous driving. Instead of monocular 3D detection and recent bird's eye view(BEV) models predicting 3D bounding box of obstacles, Occupancy Network predicts the category of voxel in specified 3D space around the ego vehicle via transforming 3D detection task into 3D voxel segmentation task, which has much superiority in tackling category outlier obstacles and providing fine-grained 3D representation. However, existing methods usually require huge computation resources than previous methods, which hinder the Occupancy Network solution applying in intelligent driving systems. To address this problem, we make an analysis of the bottleneck of Occupancy Network inference cost, and present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature and presents an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost. Further, we present a cost-free 2D segmentation branch in perspective view after feature extractors for Occupancy Network during inference phase to improve accuracy. Experimental results demonstrate that our method consistently outperforms existing methods in both accuracy and inference speed, which surpasses recent state-of-the-art (SOTA) OCCNet by 1.7% with ResNet50 backbone with about 3X inference speedup. Furthermore, our method can be easily applied to existing BEV models to transform them into Occupancy Network models.
