Table of Contents
Fetching ...

InstaRevive: One-Step Image Enhancement via Dynamic Score Matching

Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu

TL;DR

Real-world image enhancement is ill-posed and demands robust, efficient methods. InstaRevive delivers a one-step enhancement framework built on diffusion distillation with dynamic score matching, two score estimators, and caption-guided conditioning to leverage pre-trained diffusion priors. The approach introduces dynamic noise control via a controllable $T_{max}$ and a KL-based score-matching objective, enabling accurate learning of denoising trajectories and distribution alignment. It demonstrates competitive results on blind face restoration and blind image super-resolution, with substantial speedups over iterative diffusion methods and extendability to tasks like face cartoonization. Overall, the method provides a practical, scalable pathway to high-quality, controllable image restoration using diffusion priors in real-world scenarios.

Abstract

Image enhancement finds wide-ranging applications in real-world scenarios due to complex environments and the inherent limitations of imaging devices. Recent diffusion-based methods yield promising outcomes but necessitate prolonged and computationally intensive iterative sampling. In response, we propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to harness potent generative capability and minimize the sampling steps. To fully exploit the potential of the pre-trained diffusion model, we devise a practical and effective diffusion distillation pipeline using dynamic control to address inaccuracies in updating direction during score matching. Our control strategy enables a dynamic diffusing scope, facilitating precise learning of denoising trajectories within the diffusion model and ensuring accurate distribution matching gradients during training. Additionally, to enrich guidance for the generative power, we incorporate textual prompts via image captioning as auxiliary conditions, fostering further exploration of the diffusion model. Extensive experiments substantiate the efficacy of our framework across a diverse array of challenging tasks and datasets, unveiling the compelling efficacy and efficiency of InstaRevive in delivering high-quality and visually appealing results. Code is available at https://github.com/EternalEvan/InstaRevive.

InstaRevive: One-Step Image Enhancement via Dynamic Score Matching

TL;DR

Real-world image enhancement is ill-posed and demands robust, efficient methods. InstaRevive delivers a one-step enhancement framework built on diffusion distillation with dynamic score matching, two score estimators, and caption-guided conditioning to leverage pre-trained diffusion priors. The approach introduces dynamic noise control via a controllable and a KL-based score-matching objective, enabling accurate learning of denoising trajectories and distribution alignment. It demonstrates competitive results on blind face restoration and blind image super-resolution, with substantial speedups over iterative diffusion methods and extendability to tasks like face cartoonization. Overall, the method provides a practical, scalable pathway to high-quality, controllable image restoration using diffusion priors in real-world scenarios.

Abstract

Image enhancement finds wide-ranging applications in real-world scenarios due to complex environments and the inherent limitations of imaging devices. Recent diffusion-based methods yield promising outcomes but necessitate prolonged and computationally intensive iterative sampling. In response, we propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to harness potent generative capability and minimize the sampling steps. To fully exploit the potential of the pre-trained diffusion model, we devise a practical and effective diffusion distillation pipeline using dynamic control to address inaccuracies in updating direction during score matching. Our control strategy enables a dynamic diffusing scope, facilitating precise learning of denoising trajectories within the diffusion model and ensuring accurate distribution matching gradients during training. Additionally, to enrich guidance for the generative power, we incorporate textual prompts via image captioning as auxiliary conditions, fostering further exploration of the diffusion model. Extensive experiments substantiate the efficacy of our framework across a diverse array of challenging tasks and datasets, unveiling the compelling efficacy and efficiency of InstaRevive in delivering high-quality and visually appealing results. Code is available at https://github.com/EternalEvan/InstaRevive.

Paper Structure

This paper contains 28 sections, 15 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Our InstaRevive showcases remarkable image enhancement capabilities across diverse tasks. Leveraging highly effective dynamic score matching with textual prompts, our framework adeptly harnesses the rich knowledge within the pre-trained diffusion model for (a) blind image super-resolution and (b) blind face restoration using only one single step. Furthermore, we demonstrate that InstaRevive seamlessly generalizes to additional related tasks such as (c) face cartoonization.
  • Figure 2: Comparison with existing score-based matching. (a) Existing score-based distillation uses a broad range of perturbations, causing large noise to shift the generated result $\bm x$ far from the GT. This results in inaccurate score estimations (depicted by low-quality pseudo-GT $\hat{\bm x}_0$) and impedes the distillation. (b) Our dynamic score matching adjusts $\sigma_{T_{\rm max}}$ to control the perturbation scale, ensuring more accurate score estimations and aligning distributions more closely.
  • Figure 3: The overall framework of InstaRevive. InstaRevive utilizes a score-based diffusion distillation framework for image enhancement. During training, we employ two score estimators to calculate the gradients of the KL divergence. To improve estimation accuracy, we devise a dynamic control strategy to regulate the diffusing scope and adjust loss function weights. During inference, our generator can yield high-quality and visually appealing results in a single step.
  • Figure 4: Qualitative comparisons on the real-world faces. Our method demonstrates impressive enhancement capabilities on real-world faces, producing high-fidelity and visually appealing faces. Compared to other methods, InstaRevive exhibits robustness when handling challenging cases.
  • Figure 5: Qualitative comparisons on real-world datasets. Our InstaRevive delivers exceptional details with just one-step inference. The numbers following each method indicate the corresponding inference steps. More results and comparisons can be found in Figure. \ref{['fig:supp_qualitative']} and Figure. \ref{['fig:bsrsupp']}.
  • ...and 15 more figures