Table of Contents
Fetching ...

Bracket Diffusion: HDR Image Generation by Consistent LDR Denoising

Mojtaba Bemana, Thomas Leimkühler, Karol Myszkowski, Hans-Peter Seidel, Tobias Ritschel

TL;DR

This work presents Bracket Diffusion, a training-free approach to HDR image generation by running diffusion on multiple LDR exposure brackets produced by pre-trained black-box diffusion models. A bracket-consistency posterior couples these brackets across exposures, enabling coherent HDR fusion without HDR data or retraining. The method supports unconditional and conditional (text/histogram) generation and achieves state-of-the-art results on LDR2HDR and HDR generation tasks, especially in saturated regions, while incurring higher inference costs due to multi-bracket diffusion. It demonstrates practical HDR synthesis capabilities and flexibility for conditioning, with potential extensions to HDR video and perception-driven HDR content creation.

Abstract

We demonstrate generating HDR images using the concerted action of multiple black-box, pre-trained LDR image diffusion models. Relying on a pre-trained LDR generative diffusion models is vital as, first, there is no sufficiently large HDR image dataset available to re-train them, and, second, even if it was, re-training such models is impossible for most compute budgets. Instead, we seek inspiration from the HDR image capture literature that traditionally fuses sets of LDR images, called "exposure brackets'', to produce a single HDR image. We operate multiple denoising processes to generate multiple LDR brackets that together form a valid HDR result. The key to making this work is to introduce a consistency term into the diffusion process to couple the brackets such that they agree across the exposure range they share while accounting for possible differences due to the quantization error. We demonstrate state-of-the-art unconditional and conditional or restoration-type (LDR2HDR) generative modeling results, yet in HDR.

Bracket Diffusion: HDR Image Generation by Consistent LDR Denoising

TL;DR

This work presents Bracket Diffusion, a training-free approach to HDR image generation by running diffusion on multiple LDR exposure brackets produced by pre-trained black-box diffusion models. A bracket-consistency posterior couples these brackets across exposures, enabling coherent HDR fusion without HDR data or retraining. The method supports unconditional and conditional (text/histogram) generation and achieves state-of-the-art results on LDR2HDR and HDR generation tasks, especially in saturated regions, while incurring higher inference costs due to multi-bracket diffusion. It demonstrates practical HDR synthesis capabilities and flexibility for conditioning, with potential extensions to HDR video and perception-driven HDR content creation.

Abstract

We demonstrate generating HDR images using the concerted action of multiple black-box, pre-trained LDR image diffusion models. Relying on a pre-trained LDR generative diffusion models is vital as, first, there is no sufficiently large HDR image dataset available to re-train them, and, second, even if it was, re-training such models is impossible for most compute budgets. Instead, we seek inspiration from the HDR image capture literature that traditionally fuses sets of LDR images, called "exposure brackets'', to produce a single HDR image. We operate multiple denoising processes to generate multiple LDR brackets that together form a valid HDR result. The key to making this work is to introduce a consistency term into the diffusion process to couple the brackets such that they agree across the exposure range they share while accounting for possible differences due to the quantization error. We demonstrate state-of-the-art unconditional and conditional or restoration-type (LDR2HDR) generative modeling results, yet in HDR.
Paper Structure (13 sections, 12 equations, 13 figures, 5 tables)

This paper contains 13 sections, 12 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Recalling HDR merging: LDR brackets are shown on the left; right, the weights for each bracket, for simplicity in binary. White means this pixel will contribute to the final HDR.
  • Figure 2: Overview of our approach. Diffusion occurs from left to right and across multiple exposure levels (brackets), shown vertically. We show an example with three brackets. The process starts with three independent noises. At each diffusion step (one is shown), denoising is guided by an brackets consistency term (middle block). In this term, first, a denoised estimate of the current noisy images is computed (Eq. \ref{['eq:current_estimate']}), then brackets are made consistent when re-exposed ($\sim$ symbol) using Eq. \ref{['eq:costDown']} and Eq. \ref{['eq:costUp']}. When diffusion has finished, the brackets form an HDR image under a common HDR fusion technique.
  • Figure 3: Posterior based on bracket consistency cost for optimizing lower exposure (top row) and higher exposure (bottom row). The horizontal axis in the cost plot represents the pixel values in the current solution $\TextOrMath{$$\mathbf x$ x$\xspace}{\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}}^i$$\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}$$\mathbf x$ x^i, and dots are placed where their value in the reference $\TextOrMath{$$\mathbf x$ x$\xspace}{\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}}^{r}$$\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}$$\mathbf x$ x^r is. The vertical axis shows the cost values, with horizontal lines representing zero cost. Depending on the exposure direction, this results in different costs for choices in $\TextOrMath{$$\mathbf x$ x$\xspace}{\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}}^i$$\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}$$\mathbf x$ x^i. When going down in exposure (top row), for the saturated region, we allow $\TextOrMath{$$\mathbf x$ x$\xspace}{\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}}^i$$\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}$$\mathbf x$ x^i to take any value within a feasible range, such that when exposed to $\TextOrMath{$$\mathbf x$ x$\xspace}{\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}}^{r}$$\hat{\TextOrMath{$ x$\xspace}{\mathbf x}}$$\mathbf x$ x^r, they will be clamped to 1. For higher exposure (bottom row), the consistency term is relaxed (indicated by a lower steepness of the penalty cost) for dark areas compared to other regions.
  • Figure 4: Text-based HDR generation. Text prompts are on the left, alongside low (EV-4), medium (EV+0), and high exposures (EV+4).
  • Figure 5: Histogram-based HDR generation. The first column shows the input image and its histogram. The other columns show our generated brackets. Note that the method never sees the input image (left), only its histogram.
  • ...and 8 more figures