Table of Contents
Fetching ...

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong

TL;DR

SUPIR addresses photo-realistic image restoration in the wild by scaling up diffusion-based priors and integrating a large-scale adaptor with a degradation-robust encoder. It combines SDXL as a strong generative prior, ZeroSFT for fine-grained control, 20M high-quality images with text, and multi-modal language guidance to enable restoration driven by textual prompts, including negative-quality cues, plus restoration-guided sampling to preserve fidelity. The approach delivers superior perceptual quality on real-world degraded images and supports controllable restoration via prompts, though full-reference metrics may lag behind in some cases, prompting discussion on evaluation standards. Overall, SUPIR demonstrates that careful model/data scaling, novel adapter design, and prompt-based control can push IR beyond traditional losses toward high-fidelity, semantically guided restorations with broad applicability.

Abstract

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

TL;DR

SUPIR addresses photo-realistic image restoration in the wild by scaling up diffusion-based priors and integrating a large-scale adaptor with a degradation-robust encoder. It combines SDXL as a strong generative prior, ZeroSFT for fine-grained control, 20M high-quality images with text, and multi-modal language guidance to enable restoration driven by textual prompts, including negative-quality cues, plus restoration-guided sampling to preserve fidelity. The approach delivers superior perceptual quality on real-world degraded images and supports controllable restoration via prompts, though full-reference metrics may lag behind in some cases, prompting discussion on evaluation standards. Overall, SUPIR demonstrates that careful model/data scaling, novel adapter design, and prompt-based control can push IR beyond traditional losses toward high-fidelity, semantically guided restorations with broad applicability.

Abstract

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.
Paper Structure (34 sections, 1 equation, 23 figures, 2 tables, 1 algorithm)

This paper contains 34 sections, 1 equation, 23 figures, 2 tables, 1 algorithm.

Figures (23)

  • Figure 1: Our SUPIR model demonstrates remarkable restoration effects on real-world low-quality images, as illustrated in (a). Additionally, SUPIR features targeted restoration capability driven by textual prompts. For instance, it can specify the restoration of blurry objects in the distance (case 1), define the material texture of objects (case 2), and adjust restoration based on high-level semantics (case 3).
  • Figure 2: This figure briefly shows the workflow of the proposed SUPIR model.
  • Figure 3: This figure illustrates (a) the overall architecture of the used SDXL and the proposed adaptor, (b) a trimmed trainable copy of the SDXL encoder with reduced ViT blocks for efficiency, and (c) a novel ZeroSFT connector for enhanced control in IR, where $X_{f}$ and $X_{s}$ denote the input feature maps from the Decoder and Encoder shortcut, respectively, $X_{c}$ is the input from the adaptor, and $X_{fo}$ is the output. The model is designed to effectively use the large-scale SDXL as a generative prior.
  • Figure 4: CFG introduces artifacts without negative training samples, hindering visual quality improvement. Adding negative samples allows further quality enhancement through CFG.
  • Figure 5: (a) We show the relative size of our data compared to other well-known datasets. Compared with SA-1B kirillov2023segment, our dataset has higher quality and more image diversity. (b) We demonstrate our restoration-guided sampling mechanism.
  • ...and 18 more figures