Table of Contents
Fetching ...

Prompt-based Ingredient-Oriented All-in-One Image Restoration

Hu Gao, Depeng Dang

TL;DR

The paper tackles real-world image restoration where multiple degradations occur, proposing CAPTNet, a data ingredient-oriented all-in-one framework that uses prompt-based conditioning to adaptively restore degraded images. CAPTNet combines CNN blocks with a Simplified Prompt-based Transformer (SPT) that includes MRAP for efficient, channel-focused attention and SGFN for selective information retention, augmented by a Feature Fusion Module to leverage multi-scale features. The authors demonstrate through extensive experiments that CAPTNet achieves competitive or superior performance to state-of-the-art task-specific and all-in-one methods across deraining, dehazing, denoising, and deblurring, with strong generalization to unknown degradations. This approach offers scalable restoration with reduced computation and shows practical potential for real-world imaging systems where degradation types vary dynamically.

Abstract

Image restoration aims to recover the high-quality images from their degraded observations. Since most existing methods have been dedicated into single degradation removal, they may not yield optimal results on other types of degradations, which do not satisfy the applications in real world scenarios. In this paper, we propose a novel data ingredient-oriented approach that leverages prompt-based learning to enable a single model to efficiently tackle multiple image degradation tasks. Specifically, we utilize a encoder to capture features and introduce prompts with degradation-specific information to guide the decoder in adaptively recovering images affected by various degradations. In order to model the local invariant properties and non-local information for high-quality image restoration, we combined CNNs operations and Transformers. Simultaneously, we made several key designs in the Transformer blocks (multi-head rearranged attention with prompts and simple-gate feed-forward network) to reduce computational requirements and selectively determines what information should be persevered to facilitate efficient recovery of potentially sharp images. Furthermore, we incorporate a feature fusion mechanism further explores the multi-scale information to improve the aggregated features. The resulting tightly interlinked hierarchy architecture, named as CAPTNet, extensive experiments demonstrate that our method performs competitively to the state-of-the-art.

Prompt-based Ingredient-Oriented All-in-One Image Restoration

TL;DR

The paper tackles real-world image restoration where multiple degradations occur, proposing CAPTNet, a data ingredient-oriented all-in-one framework that uses prompt-based conditioning to adaptively restore degraded images. CAPTNet combines CNN blocks with a Simplified Prompt-based Transformer (SPT) that includes MRAP for efficient, channel-focused attention and SGFN for selective information retention, augmented by a Feature Fusion Module to leverage multi-scale features. The authors demonstrate through extensive experiments that CAPTNet achieves competitive or superior performance to state-of-the-art task-specific and all-in-one methods across deraining, dehazing, denoising, and deblurring, with strong generalization to unknown degradations. This approach offers scalable restoration with reduced computation and shows practical potential for real-world imaging systems where degradation types vary dynamically.

Abstract

Image restoration aims to recover the high-quality images from their degraded observations. Since most existing methods have been dedicated into single degradation removal, they may not yield optimal results on other types of degradations, which do not satisfy the applications in real world scenarios. In this paper, we propose a novel data ingredient-oriented approach that leverages prompt-based learning to enable a single model to efficiently tackle multiple image degradation tasks. Specifically, we utilize a encoder to capture features and introduce prompts with degradation-specific information to guide the decoder in adaptively recovering images affected by various degradations. In order to model the local invariant properties and non-local information for high-quality image restoration, we combined CNNs operations and Transformers. Simultaneously, we made several key designs in the Transformer blocks (multi-head rearranged attention with prompts and simple-gate feed-forward network) to reduce computational requirements and selectively determines what information should be persevered to facilitate efficient recovery of potentially sharp images. Furthermore, we incorporate a feature fusion mechanism further explores the multi-scale information to improve the aggregated features. The resulting tightly interlinked hierarchy architecture, named as CAPTNet, extensive experiments demonstrate that our method performs competitively to the state-of-the-art.
Paper Structure (25 sections, 11 equations, 15 figures, 8 tables)

This paper contains 25 sections, 11 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Illustrations of our basic idea. As shown, our proposed method, CAPTNet, has a single encoder and decoder, and inject a learnable prompt at multiple decoding stages to implicitly predict degradation conditions, which is used to guide the decoder to recover various degraded images adaptively.
  • Figure 2: Visualized the differences between the clean image and its correspondence of individual image degradation datasets. (a) represents the GoPro Gopro dataset for image deblurring, (b) denotes the Rain100H 81Yang2016DeepJR dataset for image deraining, (c) is the BSD300 BSDmartin2001database for image denoising, and (d) shows the SOTS RESIDEli2018benchmarking dataset for image dehazing.
  • Figure 3: t-SNE visualization the distribution of data between datasets of individual image recovery tasks. Distinct colors denote different degradation types. (a) represents the corrupted image data (input data in the dataset), and (b) represents the corresponding high-quality image data (target data in the datasets).
  • Figure 4: t-SNE visualization the output feature of encoder for each degradation type of data.
  • Figure 5: Architecture of CAPTNet for all-in-one image restoration. Our CAPTNet combines CNNs-based blocks (Conv Block) and Transformer-based blocks (SPT Block) to capture non-local information and local invariance. The details of Conv blocks and SPT blocks are shown in Fig. \ref{['fig:6component']}.
  • ...and 10 more figures