Table of Contents
Fetching ...

Training-Free Large Model Priors for Multiple-in-One Image Restoration

Xuanhua He, Lang Li, Yingying Wang, Hui Zheng, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

TL;DR

This paper tackles the challenge of restoring high-quality images under diverse, dynamic degradations without training separate models for each degradation type. It introduces LMDIR, a training-free framework that harnesses priors from large multimodal language models and diffusion models, integrating global degradation/content knowledge and local reference priors through a four-block architecture. Key contributions include a query-based prompt encoder to refine degradation cues, degradation-/content-/reference-aware transformer blocks to fuse priors, and a training-free pipeline that outperforms state-of-the-art multi-task restoration methods on multiple benchmarks. The approach demonstrates strong generalization to unseen degradations and enables user-guided restoration, offering a scalable solution for real-world deployment where degradation is variable and difficult to anticipate.

Abstract

Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a novel multiple-in-one image restoration paradigm that leverages the generic priors from large multi-modal language models (MMLMs) and the pretrained diffusion models. In detail, LMDIR integrates three key prior knowledges: 1) global degradation knowledge from MMLMs, 2) scene-aware contextual descriptions generated by MMLMs, and 3) fine-grained high-quality reference images synthesized by diffusion models guided by MMLM descriptions. Standing on above priors, our architecture comprises a query-based prompt encoder, degradation-aware transformer block injecting global degradation knowledge, content-aware transformer block incorporating scene description, and reference-based transformer block incorporating fine-grained image priors. This design facilitates single-stage training paradigm to address various degradations while supporting both automatic and user-guided restoration. Extensive experiments demonstrate that our designed method outperforms state-of-the-art competitors on multiple evaluation benchmarks.

Training-Free Large Model Priors for Multiple-in-One Image Restoration

TL;DR

This paper tackles the challenge of restoring high-quality images under diverse, dynamic degradations without training separate models for each degradation type. It introduces LMDIR, a training-free framework that harnesses priors from large multimodal language models and diffusion models, integrating global degradation/content knowledge and local reference priors through a four-block architecture. Key contributions include a query-based prompt encoder to refine degradation cues, degradation-/content-/reference-aware transformer blocks to fuse priors, and a training-free pipeline that outperforms state-of-the-art multi-task restoration methods on multiple benchmarks. The approach demonstrates strong generalization to unseen degradations and enables user-guided restoration, offering a scalable solution for real-world deployment where degradation is variable and difficult to anticipate.

Abstract

Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a novel multiple-in-one image restoration paradigm that leverages the generic priors from large multi-modal language models (MMLMs) and the pretrained diffusion models. In detail, LMDIR integrates three key prior knowledges: 1) global degradation knowledge from MMLMs, 2) scene-aware contextual descriptions generated by MMLMs, and 3) fine-grained high-quality reference images synthesized by diffusion models guided by MMLM descriptions. Standing on above priors, our architecture comprises a query-based prompt encoder, degradation-aware transformer block injecting global degradation knowledge, content-aware transformer block incorporating scene description, and reference-based transformer block incorporating fine-grained image priors. This design facilitates single-stage training paradigm to address various degradations while supporting both automatic and user-guided restoration. Extensive experiments demonstrate that our designed method outperforms state-of-the-art competitors on multiple evaluation benchmarks.
Paper Structure (32 sections, 9 equations, 12 figures, 3 tables)

This paper contains 32 sections, 9 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: The overall pipeline of our proposed method. It achieves high-quality multiple-in-one image restoration with large model prior.
  • Figure 2: The framework of our network. We utilized the pretrained MLLM, CLIP and diffusion models for generating the prior information to guide the restoration process.
  • Figure 3: Our proposed content aware Tranformer block and degradation aware transformer block. These blocks are utilized to inject prior knowledge from content and degradation prior.
  • Figure 4: Our proposed reference based Tranformer block. This block incorporates details from reference image through local and global reference attention.
  • Figure 5: Visual comparison of multiple-in-one methods on image denoising, low light enhancement, and deraining.
  • ...and 7 more figures