Table of Contents
Fetching ...

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Xin Jin, Pengyi Jiao, Zheng-Peng Duan, Xingchao Yang, Chun-Le Guo, Bo Ren, Chongyi Li

TL;DR

LE3D tackles HDR view synthesis from noisy RAW multi-view images by marrying 3D Gaussian Splatting with three key innovations: Cone Scatter Initialization to improve distant view geometry, a tiny Color MLP to represent RAW linear color instead of spherical harmonics, and depth distortion plus near-far regularizations to reinforce accurate scene structure. The method achieves real-time rendering (~100 FPS) with training times around 1.5 GPU hours, dramatically faster than prior volumetric approaches like RawNeRF, while maintaining competitive quality and enabling downstream tasks such as HDR rendering, refocusing, and exposure variation. Quantitative and qualitative results on RAW NeRF datasets show LE3D attains comparable HDR-quality metrics to state-of-the-art while delivering 3000–6000× rendering speedups and 99% reduction in training time, owing to differentiable rasterization and an improved geometric-color representation. This work significantly lowers the barrier to practical HDR view synthesis in 3D scenes, enabling real-time computational photography pipelines that operate directly in RAW space and support post-processing like tone-mapping in real time.

Abstract

Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly using 3DGS is challenging due to its inherent drawbacks: 1) in nighttime scenes, extremely low SNR leads to poor structure-from-motion (SfM) estimation in distant views; 2) the limited representation capacity of spherical harmonics (SH) function is unsuitable for RAW linear color space; and 3) inaccurate scene structure hampers downstream tasks such as refocusing. To address these issues, we propose LE3D (Lighting Every darkness with 3DGS). Our method proposes Cone Scatter Initialization to enrich the estimation of SfM, and replaces SH with a Color MLP to represent the RAW linear color space. Additionally, we introduce depth distortion and near-far regularizations to improve the accuracy of scene structure for downstream tasks. These designs enable LE3D to perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes. Compared to previous volumetric rendering based methods, LE3D reduces training time to 1% and improves rendering speed by up to 4,000 times for 2K resolution images in terms of FPS. Code and viewer can be found in https://github.com/Srameo/LE3D .

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

TL;DR

LE3D tackles HDR view synthesis from noisy RAW multi-view images by marrying 3D Gaussian Splatting with three key innovations: Cone Scatter Initialization to improve distant view geometry, a tiny Color MLP to represent RAW linear color instead of spherical harmonics, and depth distortion plus near-far regularizations to reinforce accurate scene structure. The method achieves real-time rendering (~100 FPS) with training times around 1.5 GPU hours, dramatically faster than prior volumetric approaches like RawNeRF, while maintaining competitive quality and enabling downstream tasks such as HDR rendering, refocusing, and exposure variation. Quantitative and qualitative results on RAW NeRF datasets show LE3D attains comparable HDR-quality metrics to state-of-the-art while delivering 3000–6000× rendering speedups and 99% reduction in training time, owing to differentiable rasterization and an improved geometric-color representation. This work significantly lowers the barrier to practical HDR view synthesis in 3D scenes, enabling real-time computational photography pipelines that operate directly in RAW space and support post-processing like tone-mapping in real time.

Abstract

Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly using 3DGS is challenging due to its inherent drawbacks: 1) in nighttime scenes, extremely low SNR leads to poor structure-from-motion (SfM) estimation in distant views; 2) the limited representation capacity of spherical harmonics (SH) function is unsuitable for RAW linear color space; and 3) inaccurate scene structure hampers downstream tasks such as refocusing. To address these issues, we propose LE3D (Lighting Every darkness with 3DGS). Our method proposes Cone Scatter Initialization to enrich the estimation of SfM, and replaces SH with a Color MLP to represent the RAW linear color space. Additionally, we introduce depth distortion and near-far regularizations to improve the accuracy of scene structure for downstream tasks. These designs enable LE3D to perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes. Compared to previous volumetric rendering based methods, LE3D reduces training time to 1% and improves rendering speed by up to 4,000 times for 2K resolution images in terms of FPS. Code and viewer can be found in https://github.com/Srameo/LE3D .
Paper Structure (41 sections, 11 equations, 12 figures, 2 tables)

This paper contains 41 sections, 11 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: LE3D reconstructs a 3DGS representation of a scene from a set of multi-view noisy RAW images. As shown on the left, LE3D features fast training and real-time rendering capabilities compared to RawNeRF mildenhall2022nerf. Moreover, compared to RawGS (a 3DGS kerbl20233d we trained with RawNeRF's strategy), LE3D demonstrates superior noise resistance and the ability to represent HDR linear colors. The right side highlights the variety of real-time downstream tasks LE3D can perform, including (a) exposure variation, (b, d) changing White Balance (WB), (b) HDR rendering, and (c, d) refocus.
  • Figure 2: Pipeline of our proposed LE3D. 1) Using COLMAP to obtain the initial point cloud and camera poses. 2) Employing Cone Scatter Initialization to enrich the point clouds of distant scenes. 3) The standard 3DGS training, where we replace the original SH with our tiny Color MLP to represent the RAW linear color space. 4) We use RawNeRF's weighted L2 loss $\mathcal{L}$ (Eqn. (\ref{['eq:rawnerf_pri']})) as image-level supervision, and our proposed $\mathcal{R}_{dist}$ (Eqn. (\ref{['eq:depth_dist']})) as well as $\mathcal{R}_{nf}$ (Eqn. (\ref{['eq:near_far']})) as scene structure regularizations. In this context, $f_i$, $b_i$, and $c_i$ respectively represent the color feature, bias, and final rendered color of each gaussian $i$. Similarly, $o_i$, $r_i$, $s_i$, and $p_i$ denote the opacity, rotation, scale, and position of them.
  • Figure 3: Visual comparison between LE3D and other reconstruction methods (Zoom-in for best view). The training view contains two parts: the post-processed RAW image with linear brightness enhancement (up) and the image directly output by the device (down). By comparison to the 3DGS-based method, LE3D recovers sharper details in the distant scene and is more resistant to noise. Additionally, compared to NeRF-based methods, LE3D achieves comparable results with $3000\times$-$6000\times$ improvement in rendering speed.
  • Figure 4: Ablation studies on our purposed methods (Zoom-in for best view). CSI in (b) and Regs in (d) denote Cone Scatter Initialization and Regularizations, respectively. (e) shows the rendering result of LE3D w/ or w/o Color MLP in early stages of training.
  • Figure 5: LE3D supports various applications. RawGS$\star$ in (d) denotes using LE3D's rendered image and RawGS's structure information as input for refocusing. (c, e) are the weighted depth rendered by LE3D and RawGS, respectively. (f) shows the same scene rendered by LE3D with different exposure settings. In (g), the "$\rightarrow$" denotes global tone-mapping, while the "$\rightarrow$" represents local tone-mapping.
  • ...and 7 more figures