Table of Contents
Fetching ...

InstantHDR: Single-forward Gaussian Splatting for High Dynamic Range 3D Reconstruction

Dingqiang Ye, Jiacong Xu, Jianglu Ping, Yuxiang Guo, Chao Fan, Vishal M. Patel

Abstract

High dynamic range (HDR) novel view synthesis (NVS) aims to reconstruct HDR scenes from multi-exposure low dynamic range (LDR) images. Existing HDR pipelines heavily rely on known camera poses, well-initialized dense point clouds, and time-consuming per-scene optimization. Current feed-forward alternatives overlook the HDR problem by assuming exposure-invariant appearance. To bridge this gap, we propose InstantHDR, a feed-forward network that reconstructs 3D HDR scenes from uncalibrated multi-exposure LDR collections in a single forward pass. Specifically, we design a geometry-guided appearance modeling for multi-exposure fusion, and a meta-network for generalizable scene-specific tone mapping. Due to the lack of HDR scene data, we build a pre-training dataset, called HDR-Pretrain, for generalizable feed-forward HDR models, featuring 168 Blender-rendered scenes, diverse lighting types, and multiple camera response functions. Comprehensive experiments show that our InstantHDR delivers comparable synthesis performance to the state-of-the-art optimization-based HDR methods while enjoying $\sim700\times$ and $\sim20\times$ reconstruction speed improvement with our single-forward and post-optimization settings. All code, models, and datasets will be released after the review process.

InstantHDR: Single-forward Gaussian Splatting for High Dynamic Range 3D Reconstruction

Abstract

High dynamic range (HDR) novel view synthesis (NVS) aims to reconstruct HDR scenes from multi-exposure low dynamic range (LDR) images. Existing HDR pipelines heavily rely on known camera poses, well-initialized dense point clouds, and time-consuming per-scene optimization. Current feed-forward alternatives overlook the HDR problem by assuming exposure-invariant appearance. To bridge this gap, we propose InstantHDR, a feed-forward network that reconstructs 3D HDR scenes from uncalibrated multi-exposure LDR collections in a single forward pass. Specifically, we design a geometry-guided appearance modeling for multi-exposure fusion, and a meta-network for generalizable scene-specific tone mapping. Due to the lack of HDR scene data, we build a pre-training dataset, called HDR-Pretrain, for generalizable feed-forward HDR models, featuring 168 Blender-rendered scenes, diverse lighting types, and multiple camera response functions. Comprehensive experiments show that our InstantHDR delivers comparable synthesis performance to the state-of-the-art optimization-based HDR methods while enjoying and reconstruction speed improvement with our single-forward and post-optimization settings. All code, models, and datasets will be released after the review process.
Paper Structure (26 sections, 14 equations, 7 figures, 3 tables)

This paper contains 26 sections, 14 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparisons of reconstructed time (yellow boxes), scenes (left) and rendered views (right) between the GaussianHDR liu2025gausshdr (top), original AnySplat jiang2025anysplat (middle) and our InstantHDR (bottom). (i) GaussianHDR liu2025gausshdr spends expensive $25~mins$ and produces tearing artifacts, as its initial point clouds collapse under the sparse-view inputs. (ii) AnySplat jiang2025anysplat naively fuses multi-exposure inputs, causing ghosting artifacts and lacking exposure control. (iii) Our InstantHDR reconstructs 3D-consistent HDR scenes in few seconds and renders clean LDR images with controllable exposure time.
  • Figure 2: Overview of InstantHDR. Given multi-exposure LDR images, the frozen geometry branch estimates depth and camera poses, while the appearance branch normalizes exposures ($F_E$), fuses cross-view irradiance via geometry-guided attention ($F_A$), and recovers pixel-level details via DoG upsampling. The Gaussian head $F_G$ combines both branches to produce HDR 3D Gaussians. The Meta Net $F_M$ predicts tone-mapping parameters for rendering LDR images at controllable exposures.
  • Figure 3: Geo-guided Cross-view Attention. (a) The module reuses Q, K from the 14th frozen geometry encoder layer to guide appearance fusion. (b) Attention maps visualization shows that the it naturally and accurately matches query patches (red box) across views under large viewpoint and extreme exposure variations ($\Delta t$: 0.5--32s).
  • Figure 4: Examples from our HDR-Pretrain dataset. Each scene includes multi-view, multi-exposure LDR images at varying $\Delta t$, 32-bit HDR ground truth, depth and normal maps, rendered under diverse tone-mapping operators (Standard, AgX, Filmic).
  • Figure 5: LDR visual comparisons. Feed-forward methods jiang2025anysplat fail on multi-exposure inputs, while optimization-based methods liu2025gausshdr require $\sim$2K seconds per scene. Our InstantHDR achieves competitive quality in under 40s. Yellow/blue tags denote reconstruction time/PSNR.
  • ...and 2 more figures