Table of Contents
Fetching ...

Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training

Jiatong Xia, Lingqiao Liu

TL;DR

This work tackles the persistent challenge of close-up view synthesis in 3D Gaussian Splatting (3DGS) by introducing Close-up-GS, a progressive self-training framework. It leverages See3D as a guidance prior to enrich details in near-views, uses a staged trust-region expansion to gradually incorporate intermediate views, and applies a careful fine-tuning regime with reliable-pixel constraints and Gaussian-primitive densification. The approach demonstrates clear gains over strong baselines on extracted-LERF and LLFF datasets, including extreme $27×$ close-ups, across standard and new close-up evaluation metrics. The contributions enable more reliable, high-fidelity close-up rendering with practical impact for applications requiring accurate near-view synthesis in real-world scenes.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated impressive performance in synthesizing novel views after training on a given set of viewpoints. However, its rendering quality deteriorates when the synthesized view deviates significantly from the training views. This decline occurs due to (1) the model's difficulty in generalizing to out-of-distribution scenarios and (2) challenges in interpolating fine details caused by substantial resolution changes and occlusions. A notable case of this limitation is close-up view generation--producing views that are significantly closer to the object than those in the training set. To tackle this issue, we propose a novel approach for close-up view generation based by progressively training the 3DGS model with self-generated data. Our solution is based on three key ideas. First, we leverage the See3D model, a recently introduced 3D-aware generative model, to enhance the details of rendered views. Second, we propose a strategy to progressively expand the ``trust regions'' of the 3DGS model and update a set of reference views for See3D. Finally, we introduce a fine-tuning strategy to carefully update the 3DGS model with training data generated from the above schemes. We further define metrics for close-up views evaluation to facilitate better research on this problem. By conducting evaluations on specifically selected scenarios for close-up views, our proposed approach demonstrates a clear advantage over competitive solutions.

Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training

TL;DR

This work tackles the persistent challenge of close-up view synthesis in 3D Gaussian Splatting (3DGS) by introducing Close-up-GS, a progressive self-training framework. It leverages See3D as a guidance prior to enrich details in near-views, uses a staged trust-region expansion to gradually incorporate intermediate views, and applies a careful fine-tuning regime with reliable-pixel constraints and Gaussian-primitive densification. The approach demonstrates clear gains over strong baselines on extracted-LERF and LLFF datasets, including extreme close-ups, across standard and new close-up evaluation metrics. The contributions enable more reliable, high-fidelity close-up rendering with practical impact for applications requiring accurate near-view synthesis in real-world scenes.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated impressive performance in synthesizing novel views after training on a given set of viewpoints. However, its rendering quality deteriorates when the synthesized view deviates significantly from the training views. This decline occurs due to (1) the model's difficulty in generalizing to out-of-distribution scenarios and (2) challenges in interpolating fine details caused by substantial resolution changes and occlusions. A notable case of this limitation is close-up view generation--producing views that are significantly closer to the object than those in the training set. To tackle this issue, we propose a novel approach for close-up view generation based by progressively training the 3DGS model with self-generated data. Our solution is based on three key ideas. First, we leverage the See3D model, a recently introduced 3D-aware generative model, to enhance the details of rendered views. Second, we propose a strategy to progressively expand the ``trust regions'' of the 3DGS model and update a set of reference views for See3D. Finally, we introduce a fine-tuning strategy to carefully update the 3DGS model with training data generated from the above schemes. We further define metrics for close-up views evaluation to facilitate better research on this problem. By conducting evaluations on specifically selected scenarios for close-up views, our proposed approach demonstrates a clear advantage over competitive solutions.

Paper Structure

This paper contains 39 sections, 14 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Progressive update. (a) Setting the frontier views (blue camera) and select the anchor views (orange camera) from know views (gray camera) based on our views selection algorithm. The anchor views will also serve as the reference views of See3D. (b) Select the to-be-updated views (yellow camera) from a various of random views between frontier views and anchor views, also using the selection algorithm. (c) Render and refine the to-be-updated views and frontier views, then apply our fine-tuning procedure to update 3DGS therefore extend its "trust region" (gray region). The to-be-update views and frontier views are then added to the known views. (d) Begin the next round by setting new frontier views.
  • Figure 2: Qualitative comparisons with representative methods on extracted LERF dataset.
  • Figure 3: Comparisons of progressive update on LLFF dataset with 3 close-up scales, each scale showing two rows of results. The far distance reference images are displayed on the right, with the white box outlines the approximate close-up region.
  • Figure 4: Ablation studies of views selection. (a) shows a single-round updated Gaussian Splatting's PSNR performance on LERF dataset using same number of anchor views and different number of to-be-update views. (b) shows the corresponding See3D's inference time.
  • Figure 5: Distance factor in observability. Here we show two different positions of view $v_{1}$, and the sum of the area $A_{1}$ which is the observability of view $v_{0}$ under view $v_{1}$ and $A_{2}$ which is the observability of view $v_{1}$ under view $v_{2}$ reaches its minimum when $v_{1}$ is positioned in the midpoint of $v_{1}$ and $v_{2}$, and varies with the change of $d_{c}$ which is the distance between $v_{1}$ and the midpoint.
  • ...and 2 more figures