Omnidirectional Depth-Aided Occupancy Prediction based on Cylindrical Voxel for Autonomous Driving
Chaofan Wu, Jiaheng Li, Jinghao Cao, Ming Li, Yongkang Feng, Jiayu Wu Shuwen Xu, Zihang Gao, Sidan Du, Yang Li
TL;DR
This paper tackles the problem of accurate 3D occupancy prediction for autonomous driving under limited geometric priors by leveraging omnidirectional depth as a geometric prior and introducing a cylindrical (polar) voxel representation to match surround-view geometry. The authors propose OmniDepth-Occ, a Sketch-Coloring framework that uses depth priors to constrain a transformer-based occupancy predictor and maps 2D features onto a cylindrical voxel grid via cross-attention, aided by a 2D semantic head and temporal fusion. A virtual six-fisheye dataset is created in CARLA to address data gaps, providing extensive fisheye RGB and semantic labels in cylindrical voxels, with results showing significant improvements over baselines, especially in near-field regions. These contributions enable denser, more accurate 3D scene understanding for navigation and obstacle avoidance, with potential extensions to 3D reconstruction and object detection.
Abstract
Accurate 3D perception is essential for autonomous driving. Traditional methods often struggle with geometric ambiguity due to a lack of geometric prior. To address these challenges, we use omnidirectional depth estimation to introduce geometric prior. Based on the depth information, we propose a Sketch-Coloring framework OmniDepth-Occ. Additionally, our approach introduces a cylindrical voxel representation based on polar coordinate to better align with the radial nature of panoramic camera views. To address the lack of fisheye camera dataset in autonomous driving tasks, we also build a virtual scene dataset with six fisheye cameras, and the data volume has reached twice that of SemanticKITTI. Experimental results demonstrate that our Sketch-Coloring network significantly enhances 3D perception performance.
