Table of Contents
Fetching ...

Direct Object-Level Reconstruction via Probabilistic Gaussian Splatting

Shuai Guo, Ao Guo, Junchao Zhao, Qi Chen, Yuxiang Qi, Zechuan Li, Dong Chen, Tianjia Shao, Mingliang Xu

Abstract

Object-level 3D reconstruction play important roles across domains such as cultural heritage digitization, industrial manufacturing, and virtual reality. However, existing Gaussian Splatting-based approaches generally rely on full-scene reconstruction, in which substantial redundant background information is introduced, leading to increased computational and storage overhead. To address this limitation, we propose an efficient single-object 3D reconstruction method based on 2D Gaussian Splatting. By directly integrating foreground-background probability cues into Gaussian primitives and dynamically pruning low-probability Gaussians during training, the proposed method fundamentally focuses on an object of interest and improves the memory and computational efficiency. Our pipeline leverages probability masks generated by YOLO and SAM to supervise probabilistic Gaussian attributes, replacing binary masks with continuous probability values to mitigate boundary ambiguity. Additionally, we propose a dual-stage filtering strategy for training's startup to suppress background Gaussians. And, during training, rendered probability masks are conversely employed to refine supervision and enhance boundary consistency across views. Experiments conducted on the MIP-360, T&T, and NVOS datasets demonstrate that our method exhibits strong self-correction capability in the presence of mask errors and achieves reconstruction quality comparable to standard 3DGS approaches, while requiring only approximately 1/10 of their Gaussian amount. These results validate the efficiency and robustness of our method for single-object reconstruction and highlight its potential for applications requiring both high fidelity and computational efficiency.

Direct Object-Level Reconstruction via Probabilistic Gaussian Splatting

Abstract

Object-level 3D reconstruction play important roles across domains such as cultural heritage digitization, industrial manufacturing, and virtual reality. However, existing Gaussian Splatting-based approaches generally rely on full-scene reconstruction, in which substantial redundant background information is introduced, leading to increased computational and storage overhead. To address this limitation, we propose an efficient single-object 3D reconstruction method based on 2D Gaussian Splatting. By directly integrating foreground-background probability cues into Gaussian primitives and dynamically pruning low-probability Gaussians during training, the proposed method fundamentally focuses on an object of interest and improves the memory and computational efficiency. Our pipeline leverages probability masks generated by YOLO and SAM to supervise probabilistic Gaussian attributes, replacing binary masks with continuous probability values to mitigate boundary ambiguity. Additionally, we propose a dual-stage filtering strategy for training's startup to suppress background Gaussians. And, during training, rendered probability masks are conversely employed to refine supervision and enhance boundary consistency across views. Experiments conducted on the MIP-360, T&T, and NVOS datasets demonstrate that our method exhibits strong self-correction capability in the presence of mask errors and achieves reconstruction quality comparable to standard 3DGS approaches, while requiring only approximately 1/10 of their Gaussian amount. These results validate the efficiency and robustness of our method for single-object reconstruction and highlight its potential for applications requiring both high fidelity and computational efficiency.
Paper Structure (14 sections, 9 equations, 9 figures, 4 tables)

This paper contains 14 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overall pipeline of the proposed direct object-level reconstruction framework. The framework consists of two main components: data preprocessing on the left and Gaussian optimization on the right. In the preprocessing stage, YOLO and SAM are used to generate semantic probability masks for filtering the SfM point cloud and selecting valid views. During training, Gaussian primitives are initialized from the filtered point cloud and optimized by sampling valid views. A probability-based pruning module dynamically removes background Gaussians, enabling efficient and accurate single-object reconstruction.
  • Figure 2: The pipeline of probability mask generation.
  • Figure 3: Examples of probability masks. The first row shows erroneous probability masks caused by target detection failures, while the second row presents probability masks with blurred boundaries. The former leads to severe optimization errors and needs to be removed from the dataset, whereas the latter can be corrected through multi-view optimization and is therefore considered a tolerable error.
  • Figure 4: The growing trend in Gaussian counts during training.
  • Figure 5: Qualitative comparison of our method with baseline approaches. The first row depicts the Truck scene from the T&T dataset, the second and third rows illustrate the Garden and Kitchen scenes from the MIP-360 dataset, and the fourth row shows the Horns scene from the LLFF dataset. The first column presents the ground-truth RGB images, the second column shows the images rendered by our method, the third column displays the probability masks predicted by SAM, and the fourth column shows the probability masks rendered by our model.
  • ...and 4 more figures