Table of Contents
Fetching ...

Intra and Inter Parser-Prompted Transformers for Effective Image Restoration

Cong Wang, Jinshan Pan, Liyan Wang, Wei Wang

TL;DR

The paper tackles image restoration under unknown degradation by leveraging parser content from a large visual foundation model. It introduces PPTformer, a two-branch framework with IRNet for restoration and PPFGNet to generate parser-guided features, integrated via IN2PPT blocks comprising IntraPPA, InterPPA, and a Parser-Prompted Feed-forward Network, plus a Bidirectional Parser-Prompted Fusion (BiPPF). The core idea is to implicitly and explicitly utilize parser content during long-range attention and pixel-wise modulation to guide restoration, yielding improvements across four tasks: deraining, defocus deblurring, desnowing, and low-light enhancement. Experiments across these tasks show state-of-the-art performance, validating the effectiveness of fusing SAM-derived hierarchical structures into restoration, while offline parser generation can be memory-intensive. The approach provides a general path to infuse foundation-model-derived structure cues into low-level vision tasks, with potential for broader applicability and further integration of parser features during training.

Abstract

We propose Intra and Inter Parser-Prompted Transformers (PPTformer) that explore useful features from visual foundation models for image restoration. Specifically, PPTformer contains two parts: an Image Restoration Network (IRNet) for restoring images from degraded observations and a Parser-Prompted Feature Generation Network (PPFGNet) for providing IRNet with reliable parser information to boost restoration. To enhance the integration of the parser within IRNet, we propose Intra Parser-Prompted Attention (IntraPPA) and Inter Parser-Prompted Attention (InterPPA) to implicitly and explicitly learn useful parser features to facilitate restoration. The IntraPPA re-considers cross attention between parser and restoration features, enabling implicit perception of the parser from a long-range and intra-layer perspective. Conversely, the InterPPA initially fuses restoration features with those of the parser, followed by formulating these fused features within an attention mechanism to explicitly perceive parser information. Further, we propose a parser-prompted feed-forward network to guide restoration within pixel-wise gating modulation. Experimental results show that PPTformer achieves state-of-the-art performance on image deraining, defocus deblurring, desnowing, and low-light enhancement.

Intra and Inter Parser-Prompted Transformers for Effective Image Restoration

TL;DR

The paper tackles image restoration under unknown degradation by leveraging parser content from a large visual foundation model. It introduces PPTformer, a two-branch framework with IRNet for restoration and PPFGNet to generate parser-guided features, integrated via IN2PPT blocks comprising IntraPPA, InterPPA, and a Parser-Prompted Feed-forward Network, plus a Bidirectional Parser-Prompted Fusion (BiPPF). The core idea is to implicitly and explicitly utilize parser content during long-range attention and pixel-wise modulation to guide restoration, yielding improvements across four tasks: deraining, defocus deblurring, desnowing, and low-light enhancement. Experiments across these tasks show state-of-the-art performance, validating the effectiveness of fusing SAM-derived hierarchical structures into restoration, while offline parser generation can be memory-intensive. The approach provides a general path to infuse foundation-model-derived structure cues into low-level vision tasks, with potential for broader applicability and further integration of parser features during training.

Abstract

We propose Intra and Inter Parser-Prompted Transformers (PPTformer) that explore useful features from visual foundation models for image restoration. Specifically, PPTformer contains two parts: an Image Restoration Network (IRNet) for restoring images from degraded observations and a Parser-Prompted Feature Generation Network (PPFGNet) for providing IRNet with reliable parser information to boost restoration. To enhance the integration of the parser within IRNet, we propose Intra Parser-Prompted Attention (IntraPPA) and Inter Parser-Prompted Attention (InterPPA) to implicitly and explicitly learn useful parser features to facilitate restoration. The IntraPPA re-considers cross attention between parser and restoration features, enabling implicit perception of the parser from a long-range and intra-layer perspective. Conversely, the InterPPA initially fuses restoration features with those of the parser, followed by formulating these fused features within an attention mechanism to explicitly perceive parser information. Further, we propose a parser-prompted feed-forward network to guide restoration within pixel-wise gating modulation. Experimental results show that PPTformer achieves state-of-the-art performance on image deraining, defocus deblurring, desnowing, and low-light enhancement.

Paper Structure

This paper contains 11 sections, 4 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Illustration of our main idea. Our method is motivated by an interesting observation that SAM kirillov2023segany can parse degraded images into useful hierarchical structures (b) although severely degraded inputs (a). While extreme degradation may not provide valuable information for restoration, the parser that benefits from the powerful ability of SAM can still describe reliable structures well to facilitate restoration. To better integrate the parser into the restoration process, we develop Intra Parser-Prompted Attention and Inter Parser-Prompted Attention to implicitly and explicitly learn valuable parser content to boost image recovery.
  • Figure 2: Overall framework of PPTformer. Our PPTformer consists of two parts: (a) Image Restoration Network (IRNet); (b) Parser-Prompted Feature Generation Network (PPFGNet). The IRNet is used to restore images, while PPFGNet is used to generate parser features to provide IRNet with useful information to facilitate restoration. To better utilize the parser features to guide IRNet, we propose the Intra and Inter Parser-Prompted Attention, which implicitly and explicitly explore the useful parser features in the restoration process. Further, we propose the Parser-Prompted Feed-forward Network to integrate parser features into the feed-forward encoding process, which allows parser features to effectively guide the restoration within the pixel-wise gating modulation perspective. Moreover, we introduce the Controllable Parser Feature Propagation scheme to control parser feature propagation in both attention and networks to allow useful information to be passed for better guide image restoration.
  • Figure 3: (a) Intra Parser-Prompted Attention (IntraPPA), (b) Inter Parser-Prompted Attention (InterPPA), and (c) Parser-Prompted Feed-forward Network (PPFN). Our IntraPPA exploits the cross-attention between parser features and restoration features to implicitly explore useful parser features. The InterPPA explicitly explores the aggregation, which first utilizes BiPPF (see Fig. \ref{['fig: Bidirectional Mask-Prompted Fusion']}) to fuse parser features with restoration ones and then conducts attention to explicitly learn beneficial parser features. The PPFN integrates parser features into one of the parallel paths by BiPPF, which allows parser features to effectively guide the feed-forward restoration process within a pixel-wise gating modulation mechanism.
  • Figure 4: Bidirectional Parser-Prompted Fusion (BiPPF). Our BiPPF method effectively integrates a bidirectional flow scheme for feature fusion. It transforms parser features into restoration features and vice versa, allowing interactive integration of parser features into the restoration process. Notably, all convolutions in BiPPF are 1$\times$1 for efficient design.
  • Figure 5: Image deraining example on Rain100H derain_jorder_yang. Our PPTformer recovers results with finer structures.
  • ...and 7 more figures