Table of Contents
Fetching ...

Zero-Shot Low Light Image Enhancement with Diffusion Prior

Joshua Cho, Sara Aghajanzadeh, Zhen Zhu, D. A. Forsyth

TL;DR

This work addresses low-light image enhancement and auto white balance using a zero-shot approach that hinges on a pre-trained diffusion prior and internal self-attention signals, avoiding any training or test-time optimization. It introduces a four-step inference pipeline—preprocessing, DDIM inversion with self-attention feature extraction, AdaIN-based latent normalization to a standard Gaussian, and SA-guided denoising with a QuadPrior decoder—to achieve faithful recovery with color constancy. Across paired (LOL, LSRW) and unpaired datasets, the method delivers state-of-the-art performance among zero-shot and unsupervised methods and remains competitive with supervised baselines, while AWB evaluations show zero-shot color correction without task-specific training. The approach demonstrates that diffusion priors combined with self-attention guidance can robustly correct illumination and color distortions, offering a practical, training-free solution for LLIE and AWB with broad applicability.

Abstract

In this paper, we present a simple yet highly effective "free lunch" solution for low-light image enhancement (LLIE), which aims to restore low-light images as if acquired in well-illuminated environments. Our method necessitates no optimization, training, fine-tuning, text conditioning, or hyperparameter adjustments, yet it consistently reconstructs low-light images with superior fidelity. Specifically, we leverage a pre-trained text-to-image diffusion prior, learned from training on a large collection of natural images, and the features present in the model itself to guide the inference, in contrast to existing methods that depend on customized constraints. Comprehensive quantitative evaluations demonstrate that our approach outperforms SOTA methods on established datasets, while qualitative analyses indicate enhanced color accuracy and the rectification of subtle chromatic deviations. Furthermore, additional experiments reveal that our method, without any modifications, achieves SOTA-comparable performance in the auto white balance (AWB) task.

Zero-Shot Low Light Image Enhancement with Diffusion Prior

TL;DR

This work addresses low-light image enhancement and auto white balance using a zero-shot approach that hinges on a pre-trained diffusion prior and internal self-attention signals, avoiding any training or test-time optimization. It introduces a four-step inference pipeline—preprocessing, DDIM inversion with self-attention feature extraction, AdaIN-based latent normalization to a standard Gaussian, and SA-guided denoising with a QuadPrior decoder—to achieve faithful recovery with color constancy. Across paired (LOL, LSRW) and unpaired datasets, the method delivers state-of-the-art performance among zero-shot and unsupervised methods and remains competitive with supervised baselines, while AWB evaluations show zero-shot color correction without task-specific training. The approach demonstrates that diffusion priors combined with self-attention guidance can robustly correct illumination and color distortions, offering a practical, training-free solution for LLIE and AWB with broad applicability.

Abstract

In this paper, we present a simple yet highly effective "free lunch" solution for low-light image enhancement (LLIE), which aims to restore low-light images as if acquired in well-illuminated environments. Our method necessitates no optimization, training, fine-tuning, text conditioning, or hyperparameter adjustments, yet it consistently reconstructs low-light images with superior fidelity. Specifically, we leverage a pre-trained text-to-image diffusion prior, learned from training on a large collection of natural images, and the features present in the model itself to guide the inference, in contrast to existing methods that depend on customized constraints. Comprehensive quantitative evaluations demonstrate that our approach outperforms SOTA methods on established datasets, while qualitative analyses indicate enhanced color accuracy and the rectification of subtle chromatic deviations. Furthermore, additional experiments reveal that our method, without any modifications, achieves SOTA-comparable performance in the auto white balance (AWB) task.

Paper Structure

This paper contains 8 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: In low-light image enhancement, an ideal method should achieve color constancycolor_constancy_algorithmcolor_constancy_algorithm2 by accurately recovering the intrinsic color (reflectance) of a scene, ensuring consistency across images taken under varying illumination conditions. However, suboptimal illumination introduces noise and hampers the accurate capture of all wavelengths, leading to color distortions. For instance, recent zero-shot diffusion-based methods such as GDP gendiffprior suffer from hallucinations, introducing non-existent elements (row 1), while FourierDiff FourierDiff is compromised by the inherent noise sensitivity of frequency-domain representations (rows 1 and 2). In contrast, our approach demonstrates superior color constancy and fidelity across images of the same scene, effectively mitigating the challenges posed by lighting variations.
  • Figure 2: Hallucination. While diffusion prior is effective for image restoration, improper application can lead to unintended hallucinations, where the model generates nonexistent structures or alters scene semantics. For example, GDP gendiffprior, a robust and versatile image restoration method, often hallucinates in the presence of substantial noise and darkness in input images. As shown in row 1, a blue-colored cabinet is inaccurately reconstructed as a sky, a pink cabinet as a building, and the entire scene resembles a battle. For less noisy inputs, our method produces clean and sharp outputs and effectively attenuates noise even in challenging cases involving severe darkness and pronounced noise levels.
  • Figure 3: LLIE Method Taxonomy.(category a) Image regression methods often produce results that are heavily dependent on the dataset, because real-world datasets are small in size. (category b) Zero-shot methods w/o a pre-trained model dynamically adjust model weights per image based on a predefined loss function. However, they require per-image tuning and may suffer from convergence instability. (category c) Zero-shot method w/ a pre-trained model and an auxiliary trainable network or parameters that learn on a per-image basis. While this enhances adaptability, it still requires per-image tuning and remains susceptible to convergence instability. (category d) In contrast, our method leverages the self-attention features of pre-trained diffusion models from the input to guide inference from any data source without any assumption about degradation and without test time tuning.
  • Figure 4: Overall pipeline of our method. Our method offers a simple yet highly effective "free lunch" solution for both LLIE and AWB and consists of four main steps: (1) preprocessing; (2) inverting the input image; (3) adjusting the resulting noised latent with Adaptive Instance Normalization (AdaIN) to match standard normal distributions $\mathcal{N}(0, I)$; and (4) denoising the inverted representation with self-attention features extracted during the inversion process, without relying on prior assumptions or external constraints.
  • Figure 5: Qualitative evaluation of our method against existing unsupervised and zero-shot approaches on the paired LOL dataset. Please zoom in without night-light mode to accurately compare colors and observe noise reduction in each method. Our method demonstrates consistency with the ground truth as well as across different images of the same scene (see rows 1 and 2), highlighting the reliability and robustness of our approach. Moreover, our method demonstrates reduced susceptibility to incorrect color shifts compared to existing methods and accurately preserves color fidelity.
  • ...and 5 more figures