Table of Contents
Fetching ...

Enhancing Close-up Novel View Synthesis via Pseudo-labeling

Jiatong Xia, Libo Sun, Lingqiao Liu

TL;DR

We address the difficulty of generating high-fidelity close-up views with radiance-field methods trained on distant viewpoints. The core idea is to use pseudo-labeling to create reliable supervision for diverse close-up perspectives, plus an efficient training strategy and optional test-time fine-tuning. A new dataset for close-up view synthesis is introduced to benchmark methods in this regime. Experiments show substantial gains over strong NeRF and Gaussian Splatting baselines, validating the effectiveness and practicality of pseudo-label-based close-up learning.

Abstract

Recent methods, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated remarkable capabilities in novel view synthesis. However, despite their success in producing high-quality images for viewpoints similar to those seen during training, they struggle when generating detailed images from viewpoints that significantly deviate from the training set, particularly in close-up views. The primary challenge stems from the lack of specific training data for close-up views, leading to the inability of current methods to render these views accurately. To address this issue, we introduce a novel pseudo-label-based learning strategy. This approach leverages pseudo-labels derived from existing training data to provide targeted supervision across a wide range of close-up viewpoints. Recognizing the absence of benchmarks for this specific challenge, we also present a new dataset designed to assess the effectiveness of both current and future methods in this area. Our extensive experiments demonstrate the efficacy of our approach.

Enhancing Close-up Novel View Synthesis via Pseudo-labeling

TL;DR

We address the difficulty of generating high-fidelity close-up views with radiance-field methods trained on distant viewpoints. The core idea is to use pseudo-labeling to create reliable supervision for diverse close-up perspectives, plus an efficient training strategy and optional test-time fine-tuning. A new dataset for close-up view synthesis is introduced to benchmark methods in this regime. Experiments show substantial gains over strong NeRF and Gaussian Splatting baselines, validating the effectiveness and practicality of pseudo-label-based close-up learning.

Abstract

Recent methods, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated remarkable capabilities in novel view synthesis. However, despite their success in producing high-quality images for viewpoints similar to those seen during training, they struggle when generating detailed images from viewpoints that significantly deviate from the training set, particularly in close-up views. The primary challenge stems from the lack of specific training data for close-up views, leading to the inability of current methods to render these views accurately. To address this issue, we introduce a novel pseudo-label-based learning strategy. This approach leverages pseudo-labels derived from existing training data to provide targeted supervision across a wide range of close-up viewpoints. Recognizing the absence of benchmarks for this specific challenge, we also present a new dataset designed to assess the effectiveness of both current and future methods in this area. Our extensive experiments demonstrate the efficacy of our approach.

Paper Structure

This paper contains 24 sections, 15 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The generated pseudo labels. We show the intermediate outputs in the pseudo label generation process: (a) The initial pseudo label extracted from training images (i.e., $\mathbf{I}'_n$), (b) The pseudo label mask as described in Eq. \ref{['eq_mask']}, and (c) The final pseudo-label obtained after applying the mask.
  • Figure 2: Typical existing view synthesis benchmarks. The test images are positioned at the similar distance as the training images and share highly similar view directions.
  • Figure 3: Our dataset with each row as the example of a scene, where the training images on the left and the testing images on the right. Training images in each scene are in the same domain of the example, moving forward facing or moving around the objects in a similar distance, while testing images are much closer to the objects and are significantly divergent from the training views.
  • Figure 4: Quantitative comparisons with other methods. We visualize the synthesized images from our method, and compare them with Mip-NeRF, Mip-Splatting and the baseline methods.