TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction
Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall
TL;DR
TFusionOcc tackles robust 3D semantic occupancy prediction for autonomous driving by introducing an object-centric, multi-stage multi-sensor fusion framework that leverages the Student's t-distribution and T-mixture models. The method uses deformable superquadric primitives to flexibly capture geometry, a skeleton-merge scheme to fuse LiDAR and surround-view camera data, and a Transformer-based refinement to produce dense 3D occupancy via splatting. Key contributions include the MGCAFusion module, deformable T-primitives (including inverse-warp variants), and a depth-guided 3D deformable attention mechanism that achieves SOTA results on nuScenes and demonstrates strong robustness on nuScenes-C under various corruptions. The approach offers improved geometric detail, robustness to outliers, and practical scalability for edge deployment, with extensive ablations and efficiency analysis supporting its effectiveness.
Abstract
3D semantic occupancy prediction enables autonomous vehicles (AVs) to perceive fine-grained geometric and semantic structure of their surroundings from onboard sensors, which is essential for safe decision-making and navigation. Recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, the intermediate representations used by existing methods for 3D semantic occupancy prediction rely heavily on 3D voxel volumes or a set of 3D Gaussians, hindering the model's ability to efficiently and effectively capture fine-grained geometric details in the 3D driving environment. This paper introduces TFusionOcc, a novel object-centric multi-sensor fusion framework for predicting 3D semantic occupancy. By leveraging multi-stage multi-sensor fusion, Student's t-distribution, and the T-Mixture model (TMM), together with more geometrically flexible primitives, such as the deformable superquadric (superquadric with inverse warp), the proposed method achieved state-of-the-art (SOTA) performance on the nuScenes benchmark. In addition, extensive experiments were conducted on the nuScenes-C dataset to demonstrate the robustness of the proposed method in different camera and lidar corruption scenarios. The code will be available at: https://github.com/DanielMing123/TFusionOcc
