InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu
TL;DR
Real-world image enhancement is ill-posed and demands robust, efficient methods. InstaRevive delivers a one-step enhancement framework built on diffusion distillation with dynamic score matching, two score estimators, and caption-guided conditioning to leverage pre-trained diffusion priors. The approach introduces dynamic noise control via a controllable $T_{max}$ and a KL-based score-matching objective, enabling accurate learning of denoising trajectories and distribution alignment. It demonstrates competitive results on blind face restoration and blind image super-resolution, with substantial speedups over iterative diffusion methods and extendability to tasks like face cartoonization. Overall, the method provides a practical, scalable pathway to high-quality, controllable image restoration using diffusion priors in real-world scenarios.
Abstract
Image enhancement finds wide-ranging applications in real-world scenarios due to complex environments and the inherent limitations of imaging devices. Recent diffusion-based methods yield promising outcomes but necessitate prolonged and computationally intensive iterative sampling. In response, we propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to harness potent generative capability and minimize the sampling steps. To fully exploit the potential of the pre-trained diffusion model, we devise a practical and effective diffusion distillation pipeline using dynamic control to address inaccuracies in updating direction during score matching. Our control strategy enables a dynamic diffusing scope, facilitating precise learning of denoising trajectories within the diffusion model and ensuring accurate distribution matching gradients during training. Additionally, to enrich guidance for the generative power, we incorporate textual prompts via image captioning as auxiliary conditions, fostering further exploration of the diffusion model. Extensive experiments substantiate the efficacy of our framework across a diverse array of challenging tasks and datasets, unveiling the compelling efficacy and efficiency of InstaRevive in delivering high-quality and visually appealing results. Code is available at https://github.com/EternalEvan/InstaRevive.
