Table of Contents
Fetching ...

HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images

Shreyas Singh, Aryan Garg, Kaushik Mitra

TL;DR

HDRSplat extends 3D Gaussian Splatting to operate directly in 14-bit linear HDR space by supervising from denoised raw images, enabling realtime HDR 3D reconstruction in challenging nighttime and low-light scenes. The approach combines Bayer-space denoising (PMRID), a stop-gradient scaled $\mathcal{L}_1$ loss with DSSIM, and rasterization tuning to overcome initialization sparsity, achieving up to $\le 15$ minutes per scene training and $\ge 120$ fps inference. Quantitatively, it outperforms RawNeRF and Raw3DGS on 14-bit HDR renders with significant gains in LPIPS, SSIM, and PSNR, while dramatically reducing memory usage to about $0.35$ million points per scene. The work enables downstream applications like synthetic defocus, dense depth extraction, and post-capture exposure/tonemapping control, highlighting HDR rendering as a practical pathway for real-time HDR 3D scene understanding and editing.

Abstract

The recent advent of 3D Gaussian Splatting (3DGS) has revolutionized the 3D scene reconstruction space enabling high-fidelity novel view synthesis in real-time. However, with the exception of RawNeRF, all prior 3DGS and NeRF-based methods rely on 8-bit tone-mapped Low Dynamic Range (LDR) images for scene reconstruction. Such methods struggle to achieve accurate reconstructions in scenes that require a higher dynamic range. Examples include scenes captured in nighttime or poorly lit indoor spaces having a low signal-to-noise ratio, as well as daylight scenes with shadow regions exhibiting extreme contrast. Our proposed method HDRSplat tailors 3DGS to train directly on 14-bit linear raw images in near darkness which preserves the scenes' full dynamic range and content. Our key contributions are two-fold: Firstly, we propose a linear HDR space-suited loss that effectively extracts scene information from noisy dark regions and nearly saturated bright regions simultaneously, while also handling view-dependent colors without increasing the degree of spherical harmonics. Secondly, through careful rasterization tuning, we implicitly overcome the heavy reliance and sensitivity of 3DGS on point cloud initialization. This is critical for accurate reconstruction in regions of low texture, high depth of field, and low illumination. HDRSplat is the fastest method to date that does 14-bit (HDR) 3D scene reconstruction in $\le$15 minutes/scene ($\sim$30x faster than prior state-of-the-art RawNeRF). It also boasts the fastest inference speed at $\ge$120fps. We further demonstrate the applicability of our HDR scene reconstruction by showcasing various applications like synthetic defocus, dense depth map extraction, and post-capture control of exposure, tone-mapping and view-point.

HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images

TL;DR

HDRSplat extends 3D Gaussian Splatting to operate directly in 14-bit linear HDR space by supervising from denoised raw images, enabling realtime HDR 3D reconstruction in challenging nighttime and low-light scenes. The approach combines Bayer-space denoising (PMRID), a stop-gradient scaled loss with DSSIM, and rasterization tuning to overcome initialization sparsity, achieving up to minutes per scene training and fps inference. Quantitatively, it outperforms RawNeRF and Raw3DGS on 14-bit HDR renders with significant gains in LPIPS, SSIM, and PSNR, while dramatically reducing memory usage to about million points per scene. The work enables downstream applications like synthetic defocus, dense depth extraction, and post-capture exposure/tonemapping control, highlighting HDR rendering as a practical pathway for real-time HDR 3D scene understanding and editing.

Abstract

The recent advent of 3D Gaussian Splatting (3DGS) has revolutionized the 3D scene reconstruction space enabling high-fidelity novel view synthesis in real-time. However, with the exception of RawNeRF, all prior 3DGS and NeRF-based methods rely on 8-bit tone-mapped Low Dynamic Range (LDR) images for scene reconstruction. Such methods struggle to achieve accurate reconstructions in scenes that require a higher dynamic range. Examples include scenes captured in nighttime or poorly lit indoor spaces having a low signal-to-noise ratio, as well as daylight scenes with shadow regions exhibiting extreme contrast. Our proposed method HDRSplat tailors 3DGS to train directly on 14-bit linear raw images in near darkness which preserves the scenes' full dynamic range and content. Our key contributions are two-fold: Firstly, we propose a linear HDR space-suited loss that effectively extracts scene information from noisy dark regions and nearly saturated bright regions simultaneously, while also handling view-dependent colors without increasing the degree of spherical harmonics. Secondly, through careful rasterization tuning, we implicitly overcome the heavy reliance and sensitivity of 3DGS on point cloud initialization. This is critical for accurate reconstruction in regions of low texture, high depth of field, and low illumination. HDRSplat is the fastest method to date that does 14-bit (HDR) 3D scene reconstruction in 15 minutes/scene (30x faster than prior state-of-the-art RawNeRF). It also boasts the fastest inference speed at 120fps. We further demonstrate the applicability of our HDR scene reconstruction by showcasing various applications like synthetic defocus, dense depth map extraction, and post-capture control of exposure, tone-mapping and view-point.
Paper Structure (11 sections, 4 equations, 6 figures, 3 tables)

This paper contains 11 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Qualitative comparison.HDRSplat enables the highest fidelity 3D scene reconstruction at inference speed of over 120fps. This is in contrast with our baselines: Raw3DGS (trained on demosaiced raw images) & LDR-3DGS (trained on 8-bit LDR images), which give poor quality renders in nighttime scenes and RawNeRFrawnerf which even though similar in fidelity to ours takes 8 hours/scene to train.
  • Figure 2: Rendering Pipeline for generating novel HDR views from noisy raw inputs. The 3 key components highlighted are: (1) Bayer-space-denoising (2) Differentiable 3DGS rasterization (3) Flexible ISP to convert from 14-bit linear raw to tonemapped 8-bit sRGB.
  • Figure 3: Importance of bayer space denoising using PMRID: Denoised raw images successfully retrieve scene information from noisy low illumination and saturated bright regions in the raw image space due to a lowered photon shot noise level.
  • Figure 4: Point Cloud Densification: (a-b) highlight the non-uniformity and sparsity in regions of poor SfM initialization, leaving holes in the final point cloud. (c) shows improved reconstruction via our rasterization method addressing the under-reconstruction problem
  • Figure 5: Application: Accurate Depth from Novel Views using depthanything and Synthetic-Defocus
  • ...and 1 more figures