Table of Contents
Fetching ...

Low-Light Image Enhancement via Generative Perceptual Priors

Han Zhou, Wei Dong, Xiaohong Liu, Yulun Zhang, Guangtao Zhai, Jun Chen

TL;DR

This work tackles the variability and realism challenges in low-light image enhancement by introducing GPP-LLIE, a framework that derives global and local perceptual priors from Vision-Language Models to guide a transformer-based diffusion backbone. The perceptual priors are obtained via prompting a pre-trained VLM (LLaVA) to assess contrast, visibility, and sharpness, and are quantified with a sigmoid-based strategy to produce a global score and a local quality map that steer the diffusion process. The diffusion backbone is augmented with GPP-LN and LPP-Attn to incorporate these priors, enabling adaptive enhancement that preserves natural color and textures across diverse real-world lighting. Experimental results show state-of-the-art performance on paired and real-world LLIE datasets, with strong generalization and competitive visual realism, and the approach generalizes to improve other LLIE models as well.

Abstract

Although significant progress has been made in enhancing visibility, retrieving texture details, and mitigating noise in Low-Light (LL) images, the challenge persists in applying current Low-Light Image Enhancement (LLIE) methods to real-world scenarios, primarily due to the diverse illumination conditions encountered. Furthermore, the quest for generating enhancements that are visually realistic and attractive remains an underexplored realm. In response to these challenges, we introduce a novel \textbf{LLIE} framework with the guidance of \textbf{G}enerative \textbf{P}erceptual \textbf{P}riors (\textbf{GPP-LLIE}) derived from vision-language models (VLMs). Specifically, we first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors. Subsequently, to incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (\textit{\textbf{GPP-LN}}) and an attention mechanism (\textit{\textbf{LPP-Attn}}) guided by global and local perceptual priors. Extensive experiments demonstrate that our model outperforms current SOTA methods on paired LL datasets and exhibits superior generalization on real-world data. The code is released at \url{https://github.com/LowLevelAI/GPP-LLIE}.

Low-Light Image Enhancement via Generative Perceptual Priors

TL;DR

This work tackles the variability and realism challenges in low-light image enhancement by introducing GPP-LLIE, a framework that derives global and local perceptual priors from Vision-Language Models to guide a transformer-based diffusion backbone. The perceptual priors are obtained via prompting a pre-trained VLM (LLaVA) to assess contrast, visibility, and sharpness, and are quantified with a sigmoid-based strategy to produce a global score and a local quality map that steer the diffusion process. The diffusion backbone is augmented with GPP-LN and LPP-Attn to incorporate these priors, enabling adaptive enhancement that preserves natural color and textures across diverse real-world lighting. Experimental results show state-of-the-art performance on paired and real-world LLIE datasets, with strong generalization and competitive visual realism, and the approach generalizes to improve other LLIE models as well.

Abstract

Although significant progress has been made in enhancing visibility, retrieving texture details, and mitigating noise in Low-Light (LL) images, the challenge persists in applying current Low-Light Image Enhancement (LLIE) methods to real-world scenarios, primarily due to the diverse illumination conditions encountered. Furthermore, the quest for generating enhancements that are visually realistic and attractive remains an underexplored realm. In response to these challenges, we introduce a novel \textbf{LLIE} framework with the guidance of \textbf{G}enerative \textbf{P}erceptual \textbf{P}riors (\textbf{GPP-LLIE}) derived from vision-language models (VLMs). Specifically, we first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors. Subsequently, to incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (\textit{\textbf{GPP-LN}}) and an attention mechanism (\textit{\textbf{LPP-Attn}}) guided by global and local perceptual priors. Extensive experiments demonstrate that our model outperforms current SOTA methods on paired LL datasets and exhibits superior generalization on real-world data. The code is released at \url{https://github.com/LowLevelAI/GPP-LLIE}.
Paper Structure (13 sections, 5 figures, 3 tables)

This paper contains 13 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visual comparisons on real-world datasets without Ground Truth. The images analyzed are sourced from MEF MEF and NPE NPE, respectively. Our method stands out by providing a balanced enhancement effect. Unlike other approaches, our method effectively enhances luminance, not only to reveal finer details but also preserve natural color tones. Notably, the cloud details in the sky, the structural integrity of the Eiffel Tower and the texture of flowers and branches are rendered with clarity and without over-exposure artifacts or unnatural coloration, distinctly surpassing others.
  • Figure 2: Our proposed pipeline for fine-grained generative perceptual priors extraction from pre-trained VLMs. (a) The input image is patchified into several non-overlapped patches. (b) Conversation with VLMs. We develop evaluation commands to guide VLMs to assess the image globally and locally regarding the selected attribute. (c) Based on the output of VLMs, we design the sigmoid-based quantification strategy. (d) Our extracted global and local generative priors.
  • Figure 3: The overall framework of our proposed GPP-LLIE method. We first employ the encoder $\mathcal{E}$ to convert the NL image $\mathbf{I}_{nl}$ and LL image $\mathbf{I}_{ll}$ into latent space denoted as $\mathbf{z}^{0}_{nl}$ and $\mathbf{z}_{ll}$. Then, the forward diffusion process is applied upon $\mathbf{z}^{0}_{nl}$. In order to leverage the prior information to guide the reverse diffusion process, the $\mathbf{I}_{ll}$ is sent to our proposed pipeline for perceptual priors extraction. With the guidance of global perceptual score $S$, local quality map M, and LL feature $\mathbf{z}_{ll}$, we develop a transformer-based network $\epsilon_{\theta}$ and gradually transform the randomly sampled Gaussian noise $\mathbf{\hat{z}}^{T}_{nl}$ into a clear NL latent feature $\mathbf{\hat{z}}^{0}_{nl}$. Finally, the restored feature $\mathbf{\hat{z}}^{0}_{nl}$ is fed into the decoder $\mathcal{D}$ to generate the final enhancement $\mathbf{\hat{I}}_{out}$.
  • Figure 4: Visual comparisons on paired dataset. The images are sourced respectively from LOL (row $1$), LOL-v2-real (row $2$), and LOL-v2-syn (row$3$). Previous methods often result in overly smoothed images, consequently obscuring pivotal textural details. In contrast, our method yields sharper images while retaining the delicate details. For instance, our method maintains the structural complexities of the foliage and branches within the potted plant (first row). In second row, the grain on the wooden floor surface, as well as its edge contours, have been well-preserved. Similarly, in the natural landscapes, our method excels at enhancing the clarity of twig edges and maintaining color consistency.
  • Figure 5: Visual comparisons on real-world datasets (DICM DICM). Our method adeptly handles the diverse and uneven illumination levels present in the original image. It effectively enhances brightness and contrast while avoiding the overexposure of original bright areas and maintaining natural coloration, generating visually appealing results.