Table of Contents
Fetching ...

DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting

Hung Nguyen, Runfa Li, An Le, Truong Nguyen

TL;DR

Sparse-view 3D Gaussian Splatting suffers HF overfitting due to limited training views. DWTGS replaces Fourier-based frequency regularization with wavelet-space losses that supervise low-frequency information in multi-level $LL$ subbands and enforce sparsity in the high-frequency $HH$ subband, improving generalization and reducing HF hallucinations. Across LLFF, Mip-NeRF 360, and Blender NeRF benchmarks, DWTGS consistently outperforms Fourier-based counterparts, achieving PSNR gains around $0.3$-$0.4$ dB and better perceptual metrics. This LF-centric, wavelet-based framework offers a more interpretable and tunable approach to frequency regularization in sparse-view neural rendering with practical impact for robust novel-view synthesis.

Abstract

Sparse-view 3D Gaussian Splatting (3DGS) presents significant challenges in reconstructing high-quality novel views, as it often overfits to the widely-varying high-frequency (HF) details of the sparse training views. While frequency regularization can be a promising approach, its typical reliance on Fourier transforms causes difficult parameter tuning and biases towards detrimental HF learning. We propose DWTGS, a framework that rethinks frequency regularization by leveraging wavelet-space losses that provide additional spatial supervision. Specifically, we supervise only the low-frequency (LF) LL subbands at multiple DWT levels, while enforcing sparsity on the HF HH subband in a self-supervised manner. Experiments across benchmarks show that DWTGS consistently outperforms Fourier-based counterparts, as this LF-centric strategy improves generalization and reduces HF hallucinations.

DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting

TL;DR

Sparse-view 3D Gaussian Splatting suffers HF overfitting due to limited training views. DWTGS replaces Fourier-based frequency regularization with wavelet-space losses that supervise low-frequency information in multi-level subbands and enforce sparsity in the high-frequency subband, improving generalization and reducing HF hallucinations. Across LLFF, Mip-NeRF 360, and Blender NeRF benchmarks, DWTGS consistently outperforms Fourier-based counterparts, achieving PSNR gains around - dB and better perceptual metrics. This LF-centric, wavelet-based framework offers a more interpretable and tunable approach to frequency regularization in sparse-view neural rendering with practical impact for robust novel-view synthesis.

Abstract

Sparse-view 3D Gaussian Splatting (3DGS) presents significant challenges in reconstructing high-quality novel views, as it often overfits to the widely-varying high-frequency (HF) details of the sparse training views. While frequency regularization can be a promising approach, its typical reliance on Fourier transforms causes difficult parameter tuning and biases towards detrimental HF learning. We propose DWTGS, a framework that rethinks frequency regularization by leveraging wavelet-space losses that provide additional spatial supervision. Specifically, we supervise only the low-frequency (LF) LL subbands at multiple DWT levels, while enforcing sparsity on the HF HH subband in a self-supervised manner. Experiments across benchmarks show that DWTGS consistently outperforms Fourier-based counterparts, as this LF-centric strategy improves generalization and reduces HF hallucinations.

Paper Structure

This paper contains 13 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure -1: 1-level (b) and 2-level DWT subbands (c) of an image region (a) LLFF. Each subband provides directional frequency information, and gets coarser with increasing DWT level.
  • Figure 0: Architectures of our DWTGS framework (top) and FreGS FreGS (bottom). Contrary to the latter's Fourier-space loss, DWTGS proposes wavelet-space losses for sparse-view 3DGS, which consist of a LF and HF subloss. The LF loss supervises multi-level GT and render LL subbands, while the HF loss enforces sparsity of the HH subband in novel views.
  • Figure 1: Visual ablations on a novel view region (a) LLFF. LF-centric wavelet-space losses at c), d) and e) enforce better structural consistency than Fourier-space loss at b). However, fully supervising HF at d) degrades image quality. Instead, in e), the HF is self-supervised and enforced to be sparse, which yields the best quality.