Table of Contents
Fetching ...

GMODiff: One-Step Gain Map Refinement with Diffusion Priors for HDR Reconstruction

Tao Hu, Weiyu Zhou, Yanjie Tu, Peng Wu, Wei Dong, Qingsen Yan, Yanning Zhang

TL;DR

GMODiff reframes multi-exposure HDR reconstruction as conditional gain-map refinement, leveraging a degradation-aware regressor to initialize a one-step diffusion refinement guided by priors from LDR inputs. The two-stage approach uses DaReg to produce an initial gain map and reliability cues, then fine-tunes a latent diffusion model with a degradation-aware decoder to suppress artifacts while preserving structure. This yields superior perceptual and no-reference quality with significantly reduced inference time compared to prior diffusion-based HDR methods. The method demonstrates strong performance on real-world datasets and offers practical, fast HDR reconstruction suitable for real-time applications.

Abstract

Pre-trained Latent Diffusion Models (LDMs) have recently shown strong perceptual priors for low-level vision tasks, making them a promising direction for multi-exposure High Dynamic Range (HDR) reconstruction. However, directly applying LDMs to HDR remains challenging due to: (1) limited dynamic-range representation caused by 8-bit latent compression, (2) high inference cost from multi-step denoising, and (3) content hallucination inherent to generative nature. To address these challenges, we introduce GMODiff, a gain map-driven one-step diffusion framework for multi-exposure HDR reconstruction. Instead of reconstructing full HDR content, we reformulate HDR reconstruction as a conditionally guided Gain Map (GM) estimation task, where the GM encodes the extended dynamic range while retaining the same bit depth as LDR images. We initialize the denoising process from an informative regression-based estimate rather than pure noise, enabling the model to generate high-quality GMs in a single denoising step. Furthermore, recognizing that regression-based models excel in content fidelity while LDMs favor perceptual quality, we leverage regression priors to guide both the denoising process and latent decoding of the LDM, suppressing hallucinations while preserving structural accuracy. Extensive experiments demonstrate that our GMODiff performs favorably against several state-of-the-art methods and is 100 faster than previous LDM-based methods.

GMODiff: One-Step Gain Map Refinement with Diffusion Priors for HDR Reconstruction

TL;DR

GMODiff reframes multi-exposure HDR reconstruction as conditional gain-map refinement, leveraging a degradation-aware regressor to initialize a one-step diffusion refinement guided by priors from LDR inputs. The two-stage approach uses DaReg to produce an initial gain map and reliability cues, then fine-tunes a latent diffusion model with a degradation-aware decoder to suppress artifacts while preserving structure. This yields superior perceptual and no-reference quality with significantly reduced inference time compared to prior diffusion-based HDR methods. The method demonstrates strong performance on real-world datasets and offers practical, fast HDR reconstruction suitable for real-time applications.

Abstract

Pre-trained Latent Diffusion Models (LDMs) have recently shown strong perceptual priors for low-level vision tasks, making them a promising direction for multi-exposure High Dynamic Range (HDR) reconstruction. However, directly applying LDMs to HDR remains challenging due to: (1) limited dynamic-range representation caused by 8-bit latent compression, (2) high inference cost from multi-step denoising, and (3) content hallucination inherent to generative nature. To address these challenges, we introduce GMODiff, a gain map-driven one-step diffusion framework for multi-exposure HDR reconstruction. Instead of reconstructing full HDR content, we reformulate HDR reconstruction as a conditionally guided Gain Map (GM) estimation task, where the GM encodes the extended dynamic range while retaining the same bit depth as LDR images. We initialize the denoising process from an informative regression-based estimate rather than pure noise, enabling the model to generate high-quality GMs in a single denoising step. Furthermore, recognizing that regression-based models excel in content fidelity while LDMs favor perceptual quality, we leverage regression priors to guide both the denoising process and latent decoding of the LDM, suppressing hallucinations while preserving structural accuracy. Extensive experiments demonstrate that our GMODiff performs favorably against several state-of-the-art methods and is 100 faster than previous LDM-based methods.

Paper Structure

This paper contains 14 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Compared to DNN-based methods, LDM-based method generates perceptually compelling HDR-like images, demonstrating their potential for HDR reconstruction. However, they suffer from high computational overhead and are prone to hallucination artifacts that compromise physical fidelity.
  • Figure 2: Overview of the proposed GMODiff framework. (a) We train a DaReg via a dual-learning strategy to produce two regression-based priors: an initial gain map $\hat{G}$ and spatial embeddings $c_L$. The $c_L$ implicitly encode regions where the $\hat{G}$ is unreliable, serving as a degradation-aware guidance prior. (b) One-Step Diffusion Refinement initializes the denoising process from $\hat{G}$ and performs single-step denoising conditioned on $c_L$ to generate a high-fidelity GM latent code $Z_H$. (c) The DA Decoder leverages encoder features from the initial gain map $\hat{G}$, guided by $c_L$, to recover fine image details while avoiding the introduction of artifacts inherent in $\hat{G}$.
  • Figure 3: (a) Limitations of the vanilla LDMs in generating HDR content. The limitation of the LDM VAE in encoding and decoding an HDR image, visualized in multiple exposure levels, which reveals a significant fidelity loss, especially in the shadow. (b) The GM encodes pixel-wise dynamic range adjustments for the LDR image and can be used to reconstruct the HDR image via element-wise multiplication.
  • Figure 4: Visual comparisons are conducted on testing data, focusing on zoomed-in local areas of the HDR images estimated by our method and the compared techniques. Our model demonstrates the ability to generate HDR images of superior quality.
  • Figure 5: Visual comparisons are conducted on testing data , focusing on zoomed-in local areas of the HDR images estimated by our method and the compared techniques. Our model demonstrates the ability to generate HDR images of superior quality.
  • ...and 1 more figures