Table of Contents
Fetching ...

LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers

Shreshth Saini, Hakan Gedik, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract

The rapid adoption of HDR-capable devices has created a pressing need to convert the 8-bit Standard Dynamic Range (SDR) content into perceptually and physically accurate 10-bit High Dynamic Range (HDR). Existing inverse tone-mapping (ITM) methods often rely on fixed tone-mapping operators that struggle to generalize to real-world degradations, stylistic variations, and camera pipelines, frequently producing clipped highlights, desaturated colors, or unstable tone reproduction. We introduce LumaFlux, a first physically and perceptually guided diffusion transformer (DiT) for SDR-to-HDR reconstruction by adapting a large pretrained DiT. Our LumaFlux introduces (1) a Physically-Guided Adaptation (PGA) module that injects luminance, spatial descriptors, and frequency cues into attention through low-rank residuals; (2) a Perceptual Cross-Modulation (PCM) layer that stabilizes chroma and texture via FiLM conditioning from vision encoder features; and (3) an HDR Residual Coupler that fuses physical and perceptual signals under a timestep- and layer-adaptive modulation schedule. Finally, a lightweight Rational-Quadratic Spline decoder reconstructs smooth, interpretable tone fields for highlight and exposure expansion, enhancing the output of the VAE decoder to generate HDR. To enable robust HDR learning, we curate the first large-scale SDR-HDR training corpus. For fair and reproducible comparison, we further establish a new evaluation benchmark, comprising HDR references and corresponding expert-graded SDR versions. Across benchmarks, LumaFlux outperforms state-of-the-art baselines, achieving superior luminance reconstruction and perceptual color fidelity with minimal additional parameters.

LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers

Abstract

The rapid adoption of HDR-capable devices has created a pressing need to convert the 8-bit Standard Dynamic Range (SDR) content into perceptually and physically accurate 10-bit High Dynamic Range (HDR). Existing inverse tone-mapping (ITM) methods often rely on fixed tone-mapping operators that struggle to generalize to real-world degradations, stylistic variations, and camera pipelines, frequently producing clipped highlights, desaturated colors, or unstable tone reproduction. We introduce LumaFlux, a first physically and perceptually guided diffusion transformer (DiT) for SDR-to-HDR reconstruction by adapting a large pretrained DiT. Our LumaFlux introduces (1) a Physically-Guided Adaptation (PGA) module that injects luminance, spatial descriptors, and frequency cues into attention through low-rank residuals; (2) a Perceptual Cross-Modulation (PCM) layer that stabilizes chroma and texture via FiLM conditioning from vision encoder features; and (3) an HDR Residual Coupler that fuses physical and perceptual signals under a timestep- and layer-adaptive modulation schedule. Finally, a lightweight Rational-Quadratic Spline decoder reconstructs smooth, interpretable tone fields for highlight and exposure expansion, enhancing the output of the VAE decoder to generate HDR. To enable robust HDR learning, we curate the first large-scale SDR-HDR training corpus. For fair and reproducible comparison, we further establish a new evaluation benchmark, comprising HDR references and corresponding expert-graded SDR versions. Across benchmarks, LumaFlux outperforms state-of-the-art baselines, achieving superior luminance reconstruction and perceptual color fidelity with minimal additional parameters.

Paper Structure

This paper contains 31 sections, 22 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: We present LumaFlux, a universal Inverse Tone Mapping (ITM) framework that expands dynamic range and color-gamut from 8 bit BT.709 SDR to 10 bit BT.2020 HDR with PQ encoding (i.e. SDRTV-to-HDRTV) of real-world LQ SDR videos. Figure shows examples from Luma-Eval benchmark. Each pair illustrates the SDR$\!\rightarrow$HDR conversion result for one representative frame. LumaFlux restores a substantially broader dynamic range and corrects the color saturation lost during tone-mapping compression, recovering both diffuse texture and specular highlight detail.
  • Figure 2: Architectural Paradigms. (Left) Baseline Diffusion Transformer (DiT) architecture, where the backbone can be directly fine-tuned via LoRA or full-weight updates. Such adaptation often overfits small HDR datasets and leads to texture hallucination or unstable luminance restoration. (Right) The proposed LumaFlux introduces lightweight, physically interpretable modules-PGA, PCM, and the HDR Residual Coupler, inserted into the frozen MM-DiT backbone. These modules enable prompt-free, physically guided adaptation by modulating attention and MLP activations with luminance, frequency, and perceptual cues. This preserves the pretrained generative prior while allowing accurate ITM with only a few trainable parameters. (Best viewed zoomed in)
  • Figure 3: LumaFlux Overview. SDR input is split into physical ($T_{\text{phys}}$) and perceptual ($T_{\text{perc}}$) streams. With timestep/layer conditioning $\Psi(t,\ell)$, In each Luma-MMDiT block (see Fig. \ref{['fig:arc_comparison']}), PGA injects luminance and spectrum aware LoRA updates into attention, PCM applies FiLM film to modulate normalized features, and an HDR Residual Coupler fuses both cues. A frozen VAE decoder with an RQS tone-field head reconstruct HDR in PQ/BT.2020.