Table of Contents
Fetching ...

Consist-Retinex: One-Step Noise-Emphasized Consistency Training Accelerates High-Quality Retinex Enhancement

Jian Xu, Wei Chen, Shigui Li, Delu Zeng, John Paisley, Qibin Zhao

TL;DR

This work addresses the prohibitive computation of diffusion-based Retinex low-light enhancement by reframing conditional restoration as a one-step task. It introduces Consist-Retinex, which combines a dual-objective consistency loss with noise-emphasized adaptive sampling and a Retinex-aware architecture that separately handles reflectance and illumination. The method achieves state-of-the-art single-step performance on VE-LOL-L, with substantial training efficiency (1/8 budget) and real-time inference, while maintaining strong results on paired and unpaired datasets. The proposed framework lays a foundation for rapid, high-quality conditional image restoration and can extend to other Retinex-based or conditional enhancement tasks.

Abstract

Diffusion models have achieved remarkable success in low-light image enhancement through Retinex-based decomposition, yet their requirement for hundreds of iterative sampling steps severely limits practical deployment. While recent consistency models offer promising one-step generation for \textit{unconditional synthesis}, their application to \textit{conditional enhancement} remains unexplored. We present \textbf{Consist-Retinex}, the first framework adapting consistency modeling to Retinex-based low-light enhancement. Our key insight is that conditional enhancement requires fundamentally different training dynamics than unconditional generation standard consistency training focuses on low-noise regions near the data manifold, while conditional mapping critically depends on large-noise regimes that bridge degraded inputs to enhanced outputs. We introduce two core innovations: (1) a \textbf{dual-objective consistency loss} combining temporal consistency with ground-truth alignment under randomized time sampling, providing full-spectrum supervision for stable convergence; and (2) an \textbf{adaptive noise-emphasized sampling strategy} that prioritizes training on large-noise regions essential for one-step conditional generation. On VE-LOL-L, Consist-Retinex achieves \textbf{state-of-the-art performance with single-step sampling} (\textbf{PSNR: 25.51 vs. 23.41, FID: 44.73 vs. 49.59} compared to Diff-Retinex++), while requiring only \textbf{1/8 of the training budget} relative to the 1000-step Diff-Retinex baseline.

Consist-Retinex: One-Step Noise-Emphasized Consistency Training Accelerates High-Quality Retinex Enhancement

TL;DR

This work addresses the prohibitive computation of diffusion-based Retinex low-light enhancement by reframing conditional restoration as a one-step task. It introduces Consist-Retinex, which combines a dual-objective consistency loss with noise-emphasized adaptive sampling and a Retinex-aware architecture that separately handles reflectance and illumination. The method achieves state-of-the-art single-step performance on VE-LOL-L, with substantial training efficiency (1/8 budget) and real-time inference, while maintaining strong results on paired and unpaired datasets. The proposed framework lays a foundation for rapid, high-quality conditional image restoration and can extend to other Retinex-based or conditional enhancement tasks.

Abstract

Diffusion models have achieved remarkable success in low-light image enhancement through Retinex-based decomposition, yet their requirement for hundreds of iterative sampling steps severely limits practical deployment. While recent consistency models offer promising one-step generation for \textit{unconditional synthesis}, their application to \textit{conditional enhancement} remains unexplored. We present \textbf{Consist-Retinex}, the first framework adapting consistency modeling to Retinex-based low-light enhancement. Our key insight is that conditional enhancement requires fundamentally different training dynamics than unconditional generation standard consistency training focuses on low-noise regions near the data manifold, while conditional mapping critically depends on large-noise regimes that bridge degraded inputs to enhanced outputs. We introduce two core innovations: (1) a \textbf{dual-objective consistency loss} combining temporal consistency with ground-truth alignment under randomized time sampling, providing full-spectrum supervision for stable convergence; and (2) an \textbf{adaptive noise-emphasized sampling strategy} that prioritizes training on large-noise regions essential for one-step conditional generation. On VE-LOL-L, Consist-Retinex achieves \textbf{state-of-the-art performance with single-step sampling} (\textbf{PSNR: 25.51 vs. 23.41, FID: 44.73 vs. 49.59} compared to Diff-Retinex++), while requiring only \textbf{1/8 of the training budget} relative to the 1000-step Diff-Retinex baseline.

Paper Structure

This paper contains 44 sections, 9 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Illustration of consistency mapping. Green ODE trajectories evolve data samples into noise, while red arrows $f_\theta(x_t,t)$ map intermediate states back to their origin $(x_\epsilon,\epsilon)$, ensuring self-consistency along each trajectory.
  • Figure 2: Overview of Consist-Retinex framework. Given a low-light image $I_l$, we first perform Retinex decomposition to obtain $(R_l, L_l)$. Two conditional consistency models $f_\theta^R$ and $f_\theta^L$ directly map noisy components to enhanced outputs in one step, guided by the low-light input. The final result is reconstructed via element-wise multiplication.
  • Figure 3: Task-driven sampling strategy comparison.Top: Task formulations differ fundamentally—unconditional generation learns the full data distribution from pure noise, while our conditional enhancement performs one-step mapping from concatenated inputs of pure noise ($\sigma_{\max}\epsilon$) and degraded image ($I_l$). Middle: Standard log-uniform sampling concentrates on the data manifold, optimal for unconditional synthesis where final denoising occurs. Bottom: Our noise-emphasized sampling concentrates on high-noise regions where one-step conditional inference operates.
  • Figure 4: Qualitative comparison with the state-of-the-art low-light image enhancement methods on the LOL dataset.
  • Figure 5: Qualitative comparison with the state-of-the-art low-light image enhancement methods on the VE-LOL_L dataset.
  • ...and 1 more figures