Restoring Real-World Images with an Internal Detail Enhancement Diffusion Model
Peng Xiao, Hongbo Zhao, Yijun Wang, Jianxin Lin
TL;DR
The paper tackles the problem of restoring real-world degraded images with unknown degradations while enabling object-level color control. It introduces Internal Image Detail Enhancement (IIDE), a fine-tuning technique that constrains the denoising steps of a frozen diffusion model to preserve structure and texture, using a degradation-aware self-regularization within the diffusion process. By employing a ControlNet-based setup and a mix-up training strategy with DDIM sampling, the method supports text-guided restoration across old photo restoration and image super-resolution, as well as text-guided colorization. Experiments show improvements in perceptual metrics such as CLIPIQA, MUSIQ, and FID over state-of-the-art baselines, yielding high-fidelity restorations with editable color aspects and practical applicability to archival photo editing and restoration workflows.
Abstract
Restoring real-world degraded images, such as old photographs or low-resolution images, presents a significant challenge due to the complex, mixed degradations they exhibit, such as scratches, color fading, and noise. Recent data-driven approaches have struggled with two main challenges: achieving high-fidelity restoration and providing object-level control over colorization. While diffusion models have shown promise in generating high-quality images with specific controls, they often fail to fully preserve image details during restoration. In this work, we propose an internal detail-preserving diffusion model for high-fidelity restoration of real-world degraded images. Our method utilizes a pre-trained Stable Diffusion model as a generative prior, eliminating the need to train a model from scratch. Central to our approach is the Internal Image Detail Enhancement (IIDE) technique, which directs the diffusion model to preserve essential structural and textural information while mitigating degradation effects. The process starts by mapping the input image into a latent space, where we inject the diffusion denoising process with degradation operations that simulate the effects of various degradation factors. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art models in both qualitative assessments and perceptual quantitative evaluations. Additionally, our approach supports text-guided restoration, enabling object-level colorization control that mimics the expertise of professional photo editing.
