Table of Contents
Fetching ...

Fast Occupancy Network

Mingjie Lu, Yuanxian Huang, Ji Liu, Xingliang Huang, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum

TL;DR

The paper tackles the high computational cost of 3D occupancy networks for autonomous driving by turning a BEV-based representation into a 3D voxel prediction using a lightweight deformable 2D convolution for lifting. It introduces a Partial Voxel FPN to efficiently fuse multi-scale voxel features and adds a cost-free perspective-view segmentation branch to bolster accuracy during training. Empirical results on OpenOcc and SemanticKITTI show the approach achieves state-of-the-art or near-state-of-the-art accuracy while delivering up to roughly 3× faster inference on common backbones, and it remains easily transferable to existing BEV models. This work offers a practical pathway to deploy accurate 3D occupancy in real-time perception pipelines for autonomous driving.

Abstract

Occupancy Network has recently attracted much attention in autonomous driving. Instead of monocular 3D detection and recent bird's eye view(BEV) models predicting 3D bounding box of obstacles, Occupancy Network predicts the category of voxel in specified 3D space around the ego vehicle via transforming 3D detection task into 3D voxel segmentation task, which has much superiority in tackling category outlier obstacles and providing fine-grained 3D representation. However, existing methods usually require huge computation resources than previous methods, which hinder the Occupancy Network solution applying in intelligent driving systems. To address this problem, we make an analysis of the bottleneck of Occupancy Network inference cost, and present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature and presents an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost. Further, we present a cost-free 2D segmentation branch in perspective view after feature extractors for Occupancy Network during inference phase to improve accuracy. Experimental results demonstrate that our method consistently outperforms existing methods in both accuracy and inference speed, which surpasses recent state-of-the-art (SOTA) OCCNet by 1.7% with ResNet50 backbone with about 3X inference speedup. Furthermore, our method can be easily applied to existing BEV models to transform them into Occupancy Network models.

Fast Occupancy Network

TL;DR

The paper tackles the high computational cost of 3D occupancy networks for autonomous driving by turning a BEV-based representation into a 3D voxel prediction using a lightweight deformable 2D convolution for lifting. It introduces a Partial Voxel FPN to efficiently fuse multi-scale voxel features and adds a cost-free perspective-view segmentation branch to bolster accuracy during training. Empirical results on OpenOcc and SemanticKITTI show the approach achieves state-of-the-art or near-state-of-the-art accuracy while delivering up to roughly 3× faster inference on common backbones, and it remains easily transferable to existing BEV models. This work offers a practical pathway to deploy accurate 3D occupancy in real-time perception pipelines for autonomous driving.

Abstract

Occupancy Network has recently attracted much attention in autonomous driving. Instead of monocular 3D detection and recent bird's eye view(BEV) models predicting 3D bounding box of obstacles, Occupancy Network predicts the category of voxel in specified 3D space around the ego vehicle via transforming 3D detection task into 3D voxel segmentation task, which has much superiority in tackling category outlier obstacles and providing fine-grained 3D representation. However, existing methods usually require huge computation resources than previous methods, which hinder the Occupancy Network solution applying in intelligent driving systems. To address this problem, we make an analysis of the bottleneck of Occupancy Network inference cost, and present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature and presents an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost. Further, we present a cost-free 2D segmentation branch in perspective view after feature extractors for Occupancy Network during inference phase to improve accuracy. Experimental results demonstrate that our method consistently outperforms existing methods in both accuracy and inference speed, which surpasses recent state-of-the-art (SOTA) OCCNet by 1.7% with ResNet50 backbone with about 3X inference speedup. Furthermore, our method can be easily applied to existing BEV models to transform them into Occupancy Network models.

Paper Structure

This paper contains 16 sections, 3 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of our Fast Occupancy Network pipeline.
  • Figure 2: The structure of the BEV lifter module.
  • Figure 3: Overview of our Partial Voxel FPN.
  • Figure 4: Perspective view auxiliary FPN and loss. This structure is only utilized when training and does not influence the inference speed.
  • Figure 5: Perspective view supervision visualization.