Table of Contents
Fetching ...

GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting

Qianpu Sun, Changyong Shu, Sifan Zhou, Runxi Cheng, Yongxian Wei, Zichen Yu, Dawei Yang, Sirui Han, Yuan Chun

TL;DR

GSRender introduces a weakly supervised 3D occupancy estimation framework that models scenes as a deduplicated set of 3D Gaussians. By integrating a Gaussian-based head, a Gaussian rasterizer, and a Ray Compensation module that leverages adjacent frames, the method reduces duplicate predictions and better handles dynamic objects, all while using minimal 2D supervision. The approach achieves state-of-the-art RayIoU among 2D-weakly supervised methods on OCC3D-NuScenes and narrows the gap to 3D-supervised approaches, demonstrating strong practical potential for outdoor occupancy perception. Ablation studies validate the necessity of Gaussian property shifts and the RC module, and additional experiments explore sampling strategies and frame-interval effects to illuminate design choices.

Abstract

Weakly-supervised 3D occupancy perception is crucial for vision-based autonomous driving in outdoor environments. Previous methods based on NeRF often face a challenge in balancing the number of samples used. Too many samples can decrease efficiency, while too few can compromise accuracy, leading to variations in the mean Intersection over Union (mIoU) by 5-10 points. Furthermore, even with surrounding-view image inputs, only a single image is rendered from each viewpoint at any given moment. This limitation leads to duplicated predictions, which significantly impacts the practicality of the approach. However, this issue has largely been overlooked in existing research. To address this, we propose GSRender, which uses 3D Gaussian Splatting for weakly-supervised occupancy estimation, simplifying the sampling process. Additionally, we introduce the Ray Compensation module, which reduces duplicated predictions by compensating for features from adjacent frames. Finally, we redesign the dynamic loss to remove the influence of dynamic objects from adjacent frames. Extensive experiments show that our approach achieves SOTA results in RayIoU (+6.0), while also narrowing the gap with 3D- supervised methods. This work lays a solid foundation for weakly-supervised occupancy perception. The code is available at https://github.com/Jasper-sudo-Sun/GSRender.

GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting

TL;DR

GSRender introduces a weakly supervised 3D occupancy estimation framework that models scenes as a deduplicated set of 3D Gaussians. By integrating a Gaussian-based head, a Gaussian rasterizer, and a Ray Compensation module that leverages adjacent frames, the method reduces duplicate predictions and better handles dynamic objects, all while using minimal 2D supervision. The approach achieves state-of-the-art RayIoU among 2D-weakly supervised methods on OCC3D-NuScenes and narrows the gap to 3D-supervised approaches, demonstrating strong practical potential for outdoor occupancy perception. Ablation studies validate the necessity of Gaussian property shifts and the RC module, and additional experiments explore sampling strategies and frame-interval effects to illuminate design choices.

Abstract

Weakly-supervised 3D occupancy perception is crucial for vision-based autonomous driving in outdoor environments. Previous methods based on NeRF often face a challenge in balancing the number of samples used. Too many samples can decrease efficiency, while too few can compromise accuracy, leading to variations in the mean Intersection over Union (mIoU) by 5-10 points. Furthermore, even with surrounding-view image inputs, only a single image is rendered from each viewpoint at any given moment. This limitation leads to duplicated predictions, which significantly impacts the practicality of the approach. However, this issue has largely been overlooked in existing research. To address this, we propose GSRender, which uses 3D Gaussian Splatting for weakly-supervised occupancy estimation, simplifying the sampling process. Additionally, we introduce the Ray Compensation module, which reduces duplicated predictions by compensating for features from adjacent frames. Finally, we redesign the dynamic loss to remove the influence of dynamic objects from adjacent frames. Extensive experiments show that our approach achieves SOTA results in RayIoU (+6.0), while also narrowing the gap with 3D- supervised methods. This work lays a solid foundation for weakly-supervised occupancy perception. The code is available at https://github.com/Jasper-sudo-Sun/GSRender.

Paper Structure

This paper contains 31 sections, 15 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The limitations of RenderOcc pan2024renderocc. (a) NeRF-based method face a trade-off between efficiency and precision. (b) Duplicate predictions caused by the uncertainty in depth estimation.
  • Figure 2: Overall Framework of GSRender. For surround view image input, we employ an arbitrary 2D to 3D module to extract occupancy features. Using a simple Gaussian head, we predict the attributes of each Gaussian, followed by Gaussian rendering. Then, we achieve compensation for different viewpoints of the same object through the Ray Compensation (RC) module, alleviating the issue of duplicate predictions.
  • Figure 3: Gaussian Properties Field. The occupancy feature from the 2D to 3D module is fed into the Gaussian head, which outputs the shift of Gaussian's mean $\delta_\mu$, scales $\delta_s$, opacity $o$, and semantic logits $c$, representing the Gaussian's location, scale, visibility, and semantic category.
  • Figure 4: Ray Compensation. In the upper part of the feature map, it indicates that adjacent frame compensation is used to address occlusion in the current frame. In the lower part of the feature map, it shows that dynamic objects may occlude the view of the compensated frame again, so it's necessary to reduce the contribution of dynamic objects in adjacent frame.
  • Figure 5: Qualitative results of GSRender. Top view on Occ3D-nuScenes
  • ...and 4 more figures