PVP: Polar Representation Boost for 3D Semantic Occupancy Prediction
Yujing Xue, Jiaxiang Liu, Jiawei Du, Joey Tianyi Zhou
TL;DR
The paper tackles dense 3D semantic occupancy prediction using polar coordinate representations, which suffer from feature distortion and non-uniform voxel distribution. It introduces Polar Voxel Occupancy Predictor (PVP), combining Global Representation Propagation (GRP) and Plane Decomposed Convolution (PD-Conv) to address distortion and enable effective global feature propagation in polar volumes. Through a dual backbonding architecture (3D PD-Conv backbone and 2D image-to-3D backbone) with multimodal fusion, GRP-based long-range attention, and a polar-aware head that converts to Cartesian voxels, PVP achieves substantial improvements on the OpenOccupancy benchmark across input modalities, including LiDAR-only and LiDAR+image setups. The results demonstrate the viability of polar representations for dense 3D occupancy and highlight the practical potential for robust autonomous-driving perception with distorted polar grids.
Abstract
Recently, polar coordinate-based representations have shown promise for 3D perceptual tasks. Compared to Cartesian methods, polar grids provide a viable alternative, offering better detail preservation in nearby spaces while covering larger areas. However, they face feature distortion due to non-uniform division. To address these issues, we introduce the Polar Voxel Occupancy Predictor (PVP), a novel 3D multi-modal predictor that operates in polar coordinates. PVP features two key design elements to overcome distortion: a Global Represent Propagation (GRP) module that integrates global spatial data into 3D volumes, and a Plane Decomposed Convolution (PD-Conv) that simplifies 3D distortions into 2D convolutions. These innovations enable PVP to outperform existing methods, achieving significant improvements in mIoU and IoU metrics on the OpenOccupancy dataset.
