Table of Contents
Fetching ...

Pyramid Diffusion Models For Low-light Image Enhancement

Dewei Zhou, Zongxin Yang, Yi Yang

TL;DR

This work introduces PyDiff, a diffusion-model-based approach for low-light image enhancement that overcomes two core limitations of standard diffusion: slow, fixed-resolution sampling and global degradation like RGB shifts. It achieves this through pyramid diffusion, which performs sampling at progressively higher resolutions with a carefully designed downsampling schedule, and a lightweight global corrector that mitigates global color distortions without heavy cost. Empirical results on LOL and LOLV2 benchmarks demonstrate superior quality and significant speedups over prior state-of-the-art methods, with strong generalization to unseen noise and illumination distributions. PyDiff thus establishes diffusion models as a competitive, practical baseline for low-light enhancement and potentially other low-level vision tasks.

Abstract

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions.

Pyramid Diffusion Models For Low-light Image Enhancement

TL;DR

This work introduces PyDiff, a diffusion-model-based approach for low-light image enhancement that overcomes two core limitations of standard diffusion: slow, fixed-resolution sampling and global degradation like RGB shifts. It achieves this through pyramid diffusion, which performs sampling at progressively higher resolutions with a carefully designed downsampling schedule, and a lightweight global corrector that mitigates global color distortions without heavy cost. Empirical results on LOL and LOLV2 benchmarks demonstrate superior quality and significant speedups over prior state-of-the-art methods, with strong generalization to unseen noise and illumination distributions. PyDiff thus establishes diffusion models as a competitive, practical baseline for low-light enhancement and potentially other low-level vision tasks.

Abstract

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement. Recent diffusion models show realistic and detailed image generation through a sequence of denoising refinements and motivate us to introduce them to low-light image enhancement for recovering realistic details. However, we found two problems when doing this, i.e., 1) diffusion models keep constant resolution in one reverse process, which limits the speed; 2) diffusion models sometimes result in global degradation (e.g., RGB shift). To address the above problems, this paper proposes a Pyramid Diffusion model (PyDiff) for low-light image enhancement. PyDiff uses a novel pyramid diffusion method to perform sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process). Pyramid diffusion makes PyDiff much faster than vanilla diffusion models and introduces no performance degradation. Furthermore, PyDiff uses a global corrector to alleviate the global degradation that may occur in the reverse process, significantly improving the performance and making the training of diffusion models easier with little additional computational consumption. Extensive experiments on popular benchmarks show that PyDiff achieves superior performance and efficiency. Moreover, PyDiff can generalize well to unseen noise and illumination distributions.
Paper Structure (15 sections, 12 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 15 sections, 12 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) Compared with other SOTA methods, our PyDiff generates more realistic details and restores correct colors. For better viewing, we brighten the Input. (b) Vanilla diffusion models perform sampling in a constant resolution style, and they result in global degradation similar to the RGB shift we analyze in Fig. \ref{['fig:whyHue']}. (c) Our PyDiff performs sampling in a pyramid resolution style (i.e., progressively increasing resolution in one reverse process) to achieve faster speed (i.e., to sample at a lower resolution is faster). With the help of a global corrector, PyDiff shows stunning results without global degradation. Please zoom in for the best view.
  • Figure 2: Overview of proposed PyDiff. $y_{\theta}(\mathbf{x}_{t}, \mathbf{x}_{low})$ is the approximate value of $\mathbf{x}_{0}$ calculated according to the denoising network, as discussed in Eq. \ref{['houyan_theta_cond']}. For better viewing, we brighten the $\mathbf{x}_{low}$. Please zoom in for the best view.
  • Figure 3: We impose various degradations (e.g., downsampling or RGB shift) on normal-light images and get noisy $\mathbf{x}_{T/2}$ according to Eq. \ref{['eq:ddpm_forward']}. Correspondingly, we begin the reverse process of diffusion from $t=T/2$, conditional on low-light images. We want to know how these degradations affect the second half of the reverse process. (a) Downsampling does not affect the details of the final result. (b) RGB shift will not be corrected. Please zoom in for the best view.
  • Figure 4: (a) $\delta(r, t):=\vert(\sqrt{\bar{\alpha}_{t}}\mathbf{x}_{0}+\sqrt{1-\bar{\alpha}_{t}}\epsilon)-(\sqrt{\bar{\alpha}_{t}}(\mathbf{x}_{0}\downarrow_{r}\uparrow_{r})+\sqrt{1-\bar{\alpha}_{t}}\epsilon)\vert$ for different$(r,t)$, in which $\downarrow_{r}(\uparrow_{r})$ means downsampling (upsampling) with a scale factor of $r$. (b) Amplification factor $\frac{\sqrt{1-\Bar{\alpha}_{t}}}{\sqrt{\bar{\alpha_{}}_{t}}}$ for different $t$. Please zoom in for the best view.
  • Figure 5: $\boldsymbol{\epsilon}_{\theta}(\mathbf{x}_{t})$ is the predicted noise derived from the denoising network, and $\mathbf{y}_{\theta}(\mathbf{x}_{t})$ is an approximation of $\mathbf{x}_{0}$ calculated based on $\boldsymbol{\epsilon}_{\theta}(\mathbf{x}_{t})$. (a) Diffusion models result in significant global degradation, which appears in $\mathbf{y}_{\theta}(\mathbf{x}_{T})$ for the first time and affects subsequent sampling. (b) The original error $\boldsymbol{\delta}_{T}$ is nearly 0, but the amplification factor $\frac{\sqrt{1-\Bar{\alpha}_{T}}}{\sqrt{\bar{\alpha_{}}_{T}}}$ enlarges the error, which leads to an obvious RGB shift. (c) With the help of the global corrector, diffusion models give promising results. $\mathbf{y}_{c}(\mathbf{x})$ means using the global corrector to alleviate the global degradation in $\mathbf{x}$.
  • ...and 2 more figures