DA-Occ: Direction-Aware 2D Convolution for Efficient and Geometry-Preserving 3D Occupancy Prediction
Yuchen Zhou, Yan Luo, Xiaogang Wang, Xingjian Gu, Mingzhou Lu
TL;DR
DA-Occ tackles the real-time, geometry-preserving 3D occupancy prediction problem for autonomous driving by operating in a pure 2D pipeline that retains vertical geometry through height-aware voxel slicing and Direction-Aware Convolution. The method combines a DepthNet–HeightNet based Direction-Aware Geometric Encoder with a Lift-Splat-Shoot inspired 2D-to-3D view transformation and a Direction-Aware Geometric Decoder to fuse height- and BEV-based features, achieving strong accuracy at real-time speeds on Occ3D-nuScenes. Key contributions include height-aware projection, DAC for vertical and horizontal feature extraction, and a joint BEV-height fusion that preserves vertical cues while maintaining efficiency. The approach yields a favorable accuracy–efficiency balance, delivering a high RT-mIoU and demonstrating practical deployment potential for resource-constrained autonomous systems.
Abstract
Efficient and high-accuracy 3D occupancy prediction is crucial for ensuring the performance of autonomous driving (AD) systems. However, many existing methods involve trade-offs between accuracy and efficiency. Some achieve high precision but with slow inference speed, while others adopt purely bird's-eye-view (BEV)-based 2D representations to accelerate processing, inevitably sacrificing vertical cues and compromising geometric integrity. To overcome these limitations, we propose a pure 2D framework that achieves efficient 3D occupancy prediction while preserving geometric integrity. Unlike conventional Lift-Splat-Shoot (LSS) methods that rely solely on depth scores to lift 2D features into 3D space, our approach additionally introduces a height-score projection to encode vertical geometric structure. We further employ direction-aware convolution to extract geometric features along both vertical and horizontal orientations, effectively balancing accuracy and computational efficiency. On the Occ3D-nuScenes, the proposed method achieves an mIoU of 39.3\% and an inference speed of 27.7 FPS, effectively balancing accuracy and efficiency. In simulations on edge devices, the inference speed reaches 14.8 FPS, further demonstrating the method's applicability for real-time deployment in resource-constrained environments.
