GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting
Qianpu Sun, Changyong Shu, Sifan Zhou, Runxi Cheng, Yongxian Wei, Zichen Yu, Dawei Yang, Sirui Han, Yuan Chun
TL;DR
GSRender introduces a weakly supervised 3D occupancy estimation framework that models scenes as a deduplicated set of 3D Gaussians. By integrating a Gaussian-based head, a Gaussian rasterizer, and a Ray Compensation module that leverages adjacent frames, the method reduces duplicate predictions and better handles dynamic objects, all while using minimal 2D supervision. The approach achieves state-of-the-art RayIoU among 2D-weakly supervised methods on OCC3D-NuScenes and narrows the gap to 3D-supervised approaches, demonstrating strong practical potential for outdoor occupancy perception. Ablation studies validate the necessity of Gaussian property shifts and the RC module, and additional experiments explore sampling strategies and frame-interval effects to illuminate design choices.
Abstract
Weakly-supervised 3D occupancy perception is crucial for vision-based autonomous driving in outdoor environments. Previous methods based on NeRF often face a challenge in balancing the number of samples used. Too many samples can decrease efficiency, while too few can compromise accuracy, leading to variations in the mean Intersection over Union (mIoU) by 5-10 points. Furthermore, even with surrounding-view image inputs, only a single image is rendered from each viewpoint at any given moment. This limitation leads to duplicated predictions, which significantly impacts the practicality of the approach. However, this issue has largely been overlooked in existing research. To address this, we propose GSRender, which uses 3D Gaussian Splatting for weakly-supervised occupancy estimation, simplifying the sampling process. Additionally, we introduce the Ray Compensation module, which reduces duplicated predictions by compensating for features from adjacent frames. Finally, we redesign the dynamic loss to remove the influence of dynamic objects from adjacent frames. Extensive experiments show that our approach achieves SOTA results in RayIoU (+6.0), while also narrowing the gap with 3D- supervised methods. This work lays a solid foundation for weakly-supervised occupancy perception. The code is available at https://github.com/Jasper-sudo-Sun/GSRender.
