Empowering Image Recovery_ A Multi-Attention Approach

Juan Wen; Yawei Li; Chao Zhang; Weiyan Hou; Radu Timofte; Luc Van Gool

Empowering Image Recovery_ A Multi-Attention Approach

Juan Wen, Yawei Li, Chao Zhang, Weiyan Hou, Radu Timofte, Luc Van Gool

TL;DR

The paper tackles the challenge of high-quality image restoration across diverse tasks by enabling a model to systematically integrate information from long sequences, local and global contexts, and multiple feature and positional dimensions. It introduces Diverse Restormer (DART), a multi-attention transformer built on a SwinIR-like backbone that combines LongIR attention for long-range dependencies with Feature Dimension Attention and Position Dimension Attention to refine information across channels and spatial dimensions. The approach demonstrates state-of-the-art performance across five restoration tasks while maintaining compact model sizes (e.g., DART-B with ~4.5M parameters) and shows substantial efficiency gains over competing methods. Ablation studies confirm the contributions of LongIR and the dimension-wise attentions, and experiments on real and synthetic data underscore DART’s robustness and practical impact for scalable, high-fidelity image recovery.

Abstract

We propose Diverse Restormer (DART), a novel image restoration method that effectively integrates information from various sources (long sequences, local and global regions, feature dimensions, and positional dimensions) to address restoration challenges. While Transformer models have demonstrated excellent performance in image restoration due to their self-attention mechanism, they face limitations in complex scenarios. Leveraging recent advancements in Transformers and various attention mechanisms, our method utilizes customized attention mechanisms to enhance overall performance. DART, our novel network architecture, employs windowed attention to mimic the selective focusing mechanism of human eyes. By dynamically adjusting receptive fields, it optimally captures the fundamental features crucial for image resolution reconstruction. Efficiency and performance balance are achieved through the LongIR attention mechanism for long sequence image restoration. Integration of attention mechanisms across feature and positional dimensions further enhances the recovery of fine details. Evaluation across five restoration tasks consistently positions DART at the forefront. Upon acceptance, we commit to providing publicly accessible code and models to ensure reproducibility and facilitate further research.

Empowering Image Recovery_ A Multi-Attention Approach

TL;DR

Abstract

Paper Structure (11 sections, 2 equations, 7 figures, 8 tables)

This paper contains 11 sections, 2 equations, 7 figures, 8 tables.

Introduction
Related Works
Image Restoration
Vision Transformer
Method
Motivation
Network architecture
Experiments
Experimental Setup
Experimental Results
Conclusion

Figures (7)

Figure 1: (Color image Denoising CBSD68 dataset) (Noise level: 50)Our DART-B network performs denoising tasks with just 4.5M parameters, achieving the state-of-the-art level for this task. Prior works such as GRL-B li2023efficient utilized 19.81M parameters, Restormer zamir2022restormer used 26.13M parameters, and SwinIR liang2021swinir employed 11.75M parameters.
Figure 2: Image SR$\times2$ on Urban100 dataset.
Figure 3: Network Architecture. (a) Illustrates the learning module comprising stages of transformer layers. (b) The Transformer module utilizes LongIR (Sliding Window Attention, Expanded Sliding Window Attention, Global Attention) along with Position Dimension Attention and Feature Dimension Attention mechanisms to extract information from long sequences, local, global, specific feature dimensions, and different Positional Dimension dimensions regions for image restoration. (c) Demonstrates the working mechanism of LongIR Attention.
Figure 4: (d), (e), and (f) illustrate the working mechanisms of Position Dimension Attention and Feature Dimension Attention.
Figure 5: The visual comparison of the DART-B network on x3SR utilizes red bounding boxes to highlight the patch for comparison, in order to better reflect performance differences.
...and 2 more figures

Empowering Image Recovery_ A Multi-Attention Approach

TL;DR

Abstract

Empowering Image Recovery_ A Multi-Attention Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (7)