Table of Contents
Fetching ...

From Chaos to Clarity: 3DGS in the Dark

Zhihao Li, Yufei Wang, Alex Kot, Bihan Wen

TL;DR

This work tackles the problem of noise in RAW inputs degrading HDR 3D Gaussian Splatting (3DGS), especially when only a few views are available. It introduces a self-supervised framework that jointly denoises and reconstructs HDR 3DGS by integrating a physics-informed noise extractor and a noise-robust reconstruction loss, anchored by a heteroscedastic Gaussian noise model. The method, evaluated on RawNeRF data, outperforms LDR/HDR 3DGS and several pre-trained baselines in reconstruction quality and rendering speed across varying view counts, demonstrating practical viability for real-time HDR 3D capture from noisy RAW images. The contribution includes lens distortion handling, a principled noise divergence term, and a publicly available codebase, signaling a step forward for robust 3D scene reconstruction in challenging lighting conditions.

Abstract

Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shapes that overfit the noise, thereby significantly degrading reconstruction quality and reducing inference speed, especially in scenarios with limited views. To address these issues, we introduce a novel self-supervised learning framework designed to reconstruct HDR 3DGS from a limited number of noisy raw images. This framework enhances 3DGS by integrating a noise extractor and employing a noise-robust reconstruction loss that leverages a noise distribution prior. Experimental results show that our method outperforms LDR/HDR 3DGS and previous state-of-the-art (SOTA) self-supervised and supervised pre-trained models in both reconstruction quality and inference speed on the RawNeRF dataset across a broad range of training views. Code can be found in \url{https://lizhihao6.github.io/Raw3DGS}.

From Chaos to Clarity: 3DGS in the Dark

TL;DR

This work tackles the problem of noise in RAW inputs degrading HDR 3D Gaussian Splatting (3DGS), especially when only a few views are available. It introduces a self-supervised framework that jointly denoises and reconstructs HDR 3DGS by integrating a physics-informed noise extractor and a noise-robust reconstruction loss, anchored by a heteroscedastic Gaussian noise model. The method, evaluated on RawNeRF data, outperforms LDR/HDR 3DGS and several pre-trained baselines in reconstruction quality and rendering speed across varying view counts, demonstrating practical viability for real-time HDR 3D capture from noisy RAW images. The contribution includes lens distortion handling, a principled noise divergence term, and a publicly available codebase, signaling a step forward for robust 3D scene reconstruction in challenging lighting conditions.

Abstract

Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shapes that overfit the noise, thereby significantly degrading reconstruction quality and reducing inference speed, especially in scenarios with limited views. To address these issues, we introduce a novel self-supervised learning framework designed to reconstruct HDR 3DGS from a limited number of noisy raw images. This framework enhances 3DGS by integrating a noise extractor and employing a noise-robust reconstruction loss that leverages a noise distribution prior. Experimental results show that our method outperforms LDR/HDR 3DGS and previous state-of-the-art (SOTA) self-supervised and supervised pre-trained models in both reconstruction quality and inference speed on the RawNeRF dataset across a broad range of training views. Code can be found in \url{https://lizhihao6.github.io/Raw3DGS}.
Paper Structure (19 sections, 18 equations, 8 figures, 4 tables)

This paper contains 19 sections, 18 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Comparative analysis of 3DGS trained with clean raw images, denoted $\mathcal{X}$, versus noisy raw images, denoted $\tilde{\mathcal{X}}$, across various training view counts $N$. The clean raw images, captured in daylight, are selected from the RawNeRF dataset rawnerf. The noisy raw images are generated from these clean images using the noise model from PMN feng2022learnability with calibrated camera noise parameters. (a) Training with noisy raw images results in decreased PSNR in the test views, with a widening performance gap as the number of training views is reduced. (b) The rendering speed (FPS) shows a similar trend to PSNR. (c) Test view visualizations show that training with noisy images causes 3DGS to produce numerous thin, flat Gaussian shapes, leading to visual artifacts and reduced FPS, especially with fewer training views.
  • Figure 2: An illustration of how prevalent noise in raw images impacts the 3DGS optimization. (a) The imaging process inherently introduces additive noise at various stages due to physical principles and hardware limitations, represented as $\tilde{\mathbf{x}} = \mathbf{x} + \mathbf{n}$, where $\tilde{\mathbf{x}}$ and $\mathbf{x}$ denote the noisy and clean images, respectively. (b) For a real-world point $\mathbf{r}$, a collection of raw images $\tilde{\mathcal{X}} = \{\tilde{\mathbf{x}}_1, \tilde{\mathbf{x}}_2\}$ records its intensity at pixel coordinates $\{\mathbf{p}_1, \mathbf{p}_2\}$, influenced by noise. The optimal target of 3DGS for this point, denoted as $\hat{\mathbf{x}}(\mathbf{p}) = \mathbb{E}_{\tilde{\mathbf{x}}(\mathbf{p}) \sim \tilde{\mathcal{X}}}$, has a discrepancy from the clean pixel intensity $\mathbf{x}(\mathbf{p})$. The variance of this discrepancy is detailed in Eq. \ref{['eq:var_optimal_target']}.
  • Figure 3: Visualization of the 3DGS test view changes across optimization iterations. Initially, the 3DGS model fits the clean signal (at 1,000 and 3,000 iterations). However, as the iterations progress (from 10,000 to 30,000), the model starts to overfit the noise.
  • Figure 4: Illustration of the noise-robust reconstruction loss, $\mathcal{L}_{\text{nrr}}$, which comprises three components: the reconstruction loss $\mathcal{L}_{\text{RawNeRF}}$, the negative likelihood loss (NLL), and the covariance loss $\mathcal{L}_{\text{cov}}$. A noisy raw image, $\tilde{\mathbf{x}}$, is first input to the noise extractor $F_n(\cdot;\Omega)$ to estimate the noise, $\hat{\mathbf{n}}$. The estimated noise $\hat{\mathbf{n}}$ is then used to calculate the NLL loss relative to the noise distribution. After that, the normalized noise, $\hat{\mathbf{z}}$, undergoes a covariance loss, $\mathcal{L}_{\text{cov}}$, to minimize spatial dependencies among noise components. Finally, the reconstruction loss, $\mathcal{L}_{\text{RawNeRF}}$, is computed between the rendered distorted image $\mathcal{D}(\hat{\mathbf{x}})$ and the pseudo clean image $\tilde{\mathbf{x}}-\hat{\mathbf{n}}$.
  • Figure 5: Comparative evaluation of various baselines and our method on rendering quality and speed in limited views training settings. The two-stage denoiser + 3DGS methods are represented by dotted lines, while training on RGB images is indicated by square markers. All metrics are evaluated on test views within the RGB domain.
  • ...and 3 more figures