Table of Contents
Fetching ...

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

Yang Luo, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Zhineng Chen, Yu-Gang Jiang, Tao Mei

TL;DR

FreeEnhance tackles image enhancement with diffusion models by reframing it as a tuning-free two-stage process in the latent space. It introduces a frequency-aware noising stage (a two-stream scheme with gradient-guided sampling) and a denoising stage regulated by three gradient-based terms—acutance, distribution, and adversarial degradation—along with distribution calibration to preserve diffusion priors. The method leverages a pre-trained latent diffusion model (SDXL) and DDIM inversion to achieve content-consistent detail enrichment, validated on HPDv2 where it outperforms baselines and Magnific AI in both NR-IQA metrics and human preference. The approach demonstrates strong generalization to different diffusion models and extends to text-to-image generation and natural image enhancement, offering practical post-processing benefits for real-world images. Overall, FreeEnhance delivers a tunable, high-quality enhancement pipeline that preserves content while enriching details, making it valuable for commercial and consumer applications.

Abstract

The emergence of text-to-image generation models has led to the recognition that image enhancement, performed as post-processing, would significantly improve the visual quality of the generated images. Exploring diffusion models to enhance the generated images nevertheless is not trivial and necessitates to delicately enrich plentiful details while preserving the visual appearance of key content in the original image. In this paper, we propose a novel framework, namely FreeEnhance, for content-consistent image enhancement using the off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to denoise and enhance the image details. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns (e.g., edge, corner) in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality. Extensive experiments conducted on the HPDv2 dataset demonstrate that our FreeEnhance outperforms the state-of-the-art image enhancement models in terms of quantitative metrics and human preference. More remarkably, FreeEnhance also shows higher human preference compared to the commercial image enhancement solution of Magnific AI.

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process

TL;DR

FreeEnhance tackles image enhancement with diffusion models by reframing it as a tuning-free two-stage process in the latent space. It introduces a frequency-aware noising stage (a two-stream scheme with gradient-guided sampling) and a denoising stage regulated by three gradient-based terms—acutance, distribution, and adversarial degradation—along with distribution calibration to preserve diffusion priors. The method leverages a pre-trained latent diffusion model (SDXL) and DDIM inversion to achieve content-consistent detail enrichment, validated on HPDv2 where it outperforms baselines and Magnific AI in both NR-IQA metrics and human preference. The approach demonstrates strong generalization to different diffusion models and extends to text-to-image generation and natural image enhancement, offering practical post-processing benefits for real-world images. Overall, FreeEnhance delivers a tunable, high-quality enhancement pipeline that preserves content while enriching details, making it valuable for commercial and consumer applications.

Abstract

The emergence of text-to-image generation models has led to the recognition that image enhancement, performed as post-processing, would significantly improve the visual quality of the generated images. Exploring diffusion models to enhance the generated images nevertheless is not trivial and necessitates to delicately enrich plentiful details while preserving the visual appearance of key content in the original image. In this paper, we propose a novel framework, namely FreeEnhance, for content-consistent image enhancement using the off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage process that firstly adds random noise to the input image and then capitalizes on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to denoise and enhance the image details. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns (e.g., edge, corner) in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality. Extensive experiments conducted on the HPDv2 dataset demonstrate that our FreeEnhance outperforms the state-of-the-art image enhancement models in terms of quantitative metrics and human preference. More remarkably, FreeEnhance also shows higher human preference compared to the commercial image enhancement solution of Magnific AI.
Paper Structure (17 sections, 10 equations, 9 figures, 5 tables)

This paper contains 17 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The conventional image enhancement via (a) noise-and-denoising pipeline suffers from tradeoffs between creativity and content-consistency. We introduce (b) FreeEnhance, a tuning-free framework that selectively adds lighter noise in high-frequency regions to preserve content structures, while heavier noise is added in low-frequency regions to enrich details in smooth areas. Moreover, three regularizers are employed to further improve visual quality during denoising.
  • Figure 2: An overview of our Tuning-Free Image Enhancement (FreeEnhance) framework. The process of FreeEnhance begins with an input image $x$, which undergoes a two-stream noising scheme to adaptively add noise into $x$. The creative steam adds strong noise which is then partially removed by a diffusion model with gradient-guided sampling (GGS), resulting $x_{t_0}^c$. And in the stable stream, light noise is attached with the input image using DDIM inversion strategy, obtaining $x_{t_0}^s$. Then $x_{t_0}^c$ and $x_{t_0}^s$ are adaptively blended according to the high/low frequency map $M_h$/$M_l$ produced by frequency filtering of $x$, resulting the noisy image $x_{t_0}$. Then, $x_{t_0}$ is fed into diffusion models which is constrained by three regularizers, which are devised from the perspectives of image acutance, noise distribution, and adversarial degeneration, in the denoising stage to produce the enhanced version of the input image.
  • Figure 3: Comparison between images generated from composited noisy image with and without the distribution calibration. The color shift/fading can be observed on the output without the calibration.
  • Figure 4: The statistics of the noise $\epsilon_\theta(x_t;t,y)$ predicted by a diffusion model. Given the noisy image from the noising stage, the red scatters is estimated during the denoising process using SDXL and the gray ones represent the ideal values across different timesteps.
  • Figure 5: Quantitative comparisons of images enhanced by different approaches on HPDv2 benchmark. The regions in red boxes are presented in zoom-in view to ease the comparison.
  • ...and 4 more figures