Table of Contents
Fetching ...

SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting

Shiyun Xie, Zhiru Wang, Yinghao Zhu, Xu Wang, Chengwei Pan, Xiwang Dong

TL;DR

The proposed SuperGS is an expansion of Scaffold-GS designed with a two-stage coarse-to-fine training framework that outperforms state-of-the-art HRNVS methods on both forward-facing and 360-degree datasets and model uncertainty through variational feature learning and use it to guide further scene representation refinement.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has excelled in novel view synthesis (NVS) with its real-time rendering capabilities and superior quality. However, it encounters challenges for high-resolution novel view synthesis (HRNVS) due to the coarse nature of primitives derived from low-resolution input views. To address this issue, we propose SuperGS, an expansion of Scaffold-GS designed with a two-stage coarse-to-fine training framework. In the low-resolution stage, we introduce a latent feature field to represent the low-resolution scene, which serves as both the initialization and foundational information for super-resolution optimization. In the high-resolution stage, we propose a multi-view consistent densification strategy that backprojects high-resolution depth maps based on error maps and employs a multi-view voting mechanism, mitigating ambiguities caused by multi-view inconsistencies in the pseudo labels provided by 2D prior models while avoiding Gaussian redundancy. Furthermore, we model uncertainty through variational feature learning and use it to guide further scene representation refinement and adjust the supervisory effect of pseudo-labels, ensuring consistent and detailed scene reconstruction. Extensive experiments demonstrate that SuperGS outperforms state-of-the-art HRNVS methods on both forward-facing and 360-degree datasets.

SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting

TL;DR

The proposed SuperGS is an expansion of Scaffold-GS designed with a two-stage coarse-to-fine training framework that outperforms state-of-the-art HRNVS methods on both forward-facing and 360-degree datasets and model uncertainty through variational feature learning and use it to guide further scene representation refinement.

Abstract

Recently, 3D Gaussian Splatting (3DGS) has excelled in novel view synthesis (NVS) with its real-time rendering capabilities and superior quality. However, it encounters challenges for high-resolution novel view synthesis (HRNVS) due to the coarse nature of primitives derived from low-resolution input views. To address this issue, we propose SuperGS, an expansion of Scaffold-GS designed with a two-stage coarse-to-fine training framework. In the low-resolution stage, we introduce a latent feature field to represent the low-resolution scene, which serves as both the initialization and foundational information for super-resolution optimization. In the high-resolution stage, we propose a multi-view consistent densification strategy that backprojects high-resolution depth maps based on error maps and employs a multi-view voting mechanism, mitigating ambiguities caused by multi-view inconsistencies in the pseudo labels provided by 2D prior models while avoiding Gaussian redundancy. Furthermore, we model uncertainty through variational feature learning and use it to guide further scene representation refinement and adjust the supervisory effect of pseudo-labels, ensuring consistent and detailed scene reconstruction. Extensive experiments demonstrate that SuperGS outperforms state-of-the-art HRNVS methods on both forward-facing and 360-degree datasets.

Paper Structure

This paper contains 32 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Framework of our proposed SuperGS. (a) We propose a two-stage coarse-to-fine framework. We enhance Scaffold-GS by introducing a latent feature field to represent the low-resolution scene, which serves as both initialization and foundational information for super-resolution optimization. (b) In the high-resolution stage, we propose a multi-view consistent densification strategy that replaces the original gradient-based densification, avoiding overfitting and Gaussian redundancy. (c) We model anchor uncertainty through learning variational features, which is further used to guide scene representation refinement and reconstruction loss computation.
  • Figure 2: Illustration of Feature Field. For a specific anchor, we extract and interpolate features from hash tables using its coordinates, with the concatenation of features from $L$ resolution levels forming the field feature $f_{\text{field}}$ of this anchor.
  • Figure 3: Comparison of Ours Densification and Gradient-based Densification Startegy. Our method achieves better reconstruction quality with fewer anchor points, reducing memory requirements while preventing overfitting.
  • Figure 4: Illustration of Densification Strategy. We introduce a multi-view voting densification strategy that replaces the original anchor growing policy. First, we generate candidate Gaussian positions by back-projecting the corresponding pixels with depth map based on pixel-wise loss. Subsequently, new anchors are added when the accumulated vote count within a voxel exceeds a predetermined threshold.
  • Figure 5: Qualitative comparison of the HRNVS ($\times 4$) on real-world datasets. We highlight the difference with colored patches.
  • ...and 1 more figures