Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion
Minglong Xue, Jinhong He, Wenhai Wang, Mingliang Zhou
TL;DR
This work tackles unstable and visually unsatisfactory low-light image enhancement by proposing CFWD, a diffusion-based method guided by multimodal CLIP semantics in a frequency-domain wavelet space. It combines a Wavelet Diffusion Model with a Multiscale Visual-Language Guidance Network and a High Frequency Perception Module to constrain content diversity and preserve fine details, using a composite loss that includes diffusion, spectral, and content terms. The approach yields state-of-the-art quantitative gains and superior perceptual quality across diverse real-world benchmarks, including high-resolution backlit scenes, while maintaining generalization to unseen conditions. The framework demonstrates the practical impact of fusing multimodal semantics with spectral-domain diffusion for robust, perceptually faithful low-light enhancement.
Abstract
Low-light image enhancement techniques have significantly progressed, but unstable image quality recovery and unsatisfactory visual perception are still significant challenges. To solve these problems, we propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD. Specifically, CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process. Multi-scale supervision across different modalities facilitates the alignment of image features with semantic features during the wavelet diffusion process, effectively bridging the gap between degraded and normal domains. Moreover, to further promote the effective recovery of the image details, we combine the Fourier transform based on the wavelet transform and construct a Hybrid High Frequency Perception Module (HFPM) with a significant perception of the detailed features. This module avoids the diversity confusion of the wavelet diffusion process by guiding the fine-grained structure recovery of the enhancement results to achieve favourable metric and perceptually oriented enhancement. Extensive quantitative and qualitative experiments on publicly available real-world benchmarks show that our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression. The project code is available at https://github.com/hejh8/CFWD.
