Table of Contents
Fetching ...

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

TL;DR

The paper tackles the challenge of recovering high dynamic range content from a single 8-bit LDR image. It introduces a two-stage LS-Sagiri pipeline: Stage 1 Latent-SwinIR_c performs color restoration and brightness adjustment, and Stage 2 Sagiri employs a diffusion-prior conditioned on the restored image to generate plausible details in dynamic-range extremes. The training deploys specialized losses for color ($L_{color}$) and content (multi-term $L_{content}$) and a plug-in training strategy to enable compatibility with existing LDR methods, while using adaptive regional prompts during sampling. Empirical results on HDR-Real, HDR-Eye, and NTIRE datasets show consistent improvements in non-reference quality metrics and demonstrate Sagiri’s ability to generate realistic details in saturated and dark regions, with efficient inference via 30 DDPM steps.

Abstract

Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while keeping the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

TL;DR

The paper tackles the challenge of recovering high dynamic range content from a single 8-bit LDR image. It introduces a two-stage LS-Sagiri pipeline: Stage 1 Latent-SwinIR_c performs color restoration and brightness adjustment, and Stage 2 Sagiri employs a diffusion-prior conditioned on the restored image to generate plausible details in dynamic-range extremes. The training deploys specialized losses for color () and content (multi-term ) and a plug-in training strategy to enable compatibility with existing LDR methods, while using adaptive regional prompts during sampling. Empirical results on HDR-Real, HDR-Eye, and NTIRE datasets show consistent improvements in non-reference quality metrics and demonstrate Sagiri’s ability to generate realistic details in saturated and dark regions, with efficient inference via 30 DDPM steps.

Abstract

Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while keeping the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io
Paper Structure (22 sections, 7 equations, 123 figures, 4 tables)

This paper contains 22 sections, 7 equations, 123 figures, 4 tables.

Figures (123)

  • Figure 1: Common real-world scenes have broad dynamic ranges. A typical 8-bit camera captures a limited dynamic range, where the exposure value determines which part of the scene's dynamic range is captured, often resulting in either oversaturated bright regions or quantized dark areas overlwhelmed by noise. Traditionally, multiple exposures are merged into an HDR image (32-bit or 64-bit) to accurately represent the scene, which is subsequently tone-mapped to an 8-bit image for LDR displays. In our method, we directly learn to generate the final output from a single LDR image with generative diffusion prior, which includes (1) color mapping, (2) generating reasonable content for saturated/black regions, (3) enhancing details in low bit-depth regions, (4) dark region denoising.
  • Figure 2: Overview of Latent-SwinIR$_{c}$ (LS) and color reconstruction loss. Through our unique design, it is able to capture color distribution with higher fidelity.
  • Figure 3: Unknown region mask. Pixels with values of 0 or 255 are detected as unknown regions. The mask is downsampled and broadcasted to match the shape of the latent feature maps.
  • Figure 4: Overview of Sagiri. Our model takes the output of the previous stage as input, with an optional text prompt input generated using a large language model. It uses a pretrained VAE encoder to map previous result into the latent space. The obtained latent feature is concatenated with time-step noise to serve as condition. An unknown region mask (pixels with values of 0 or 255) is used to combine the input latent feature with the denoised feature map.
  • Figure 5: The first row is the result obtained using our degradation strategy, while the second row is the reference images. We aim to simulate the degradation caused by other models in dynamic range extremes during LDR enhancement and train Sagiri to handle these situations effectively.
  • ...and 118 more figures