HAT: Hybrid Attention Transformer for Image Restoration

Xiangyu Chen; Xintao Wang; Wenlong Zhang; Xiangtao Kong; Yu Qiao; Jiantao Zhou; Chao Dong

HAT: Hybrid Attention Transformer for Image Restoration

Xiangyu Chen, Xintao Wang, Wenlong Zhang, Xiangtao Kong, Yu Qiao, Jiantao Zhou, Chao Dong

TL;DR

HAT addresses the limited spatial utilization of Transformer-based image restoration by fusing window-based self-attention with channel attention and introducing overlapping cross-attention to couple neighboring windows. A same-task pre-training strategy on large-scale data further unlocks the model’s potential. The resulting architecture, including Hybrid Attention Blocks and Overlapping Cross-Attention Blocks, delivers state-of-the-art performance across SR, real-world SR, denoising, and compression artifacts reduction, with scalable gains evident in HAT-L. This approach demonstrates that activating more input pixels and enhancing cross-window information flow yields substantial practical improvements for diverse image restoration tasks.

Abstract

Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better restoration, we propose a new Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to further exploit the potential of the model for further improvement. Extensive experiments have demonstrated the effectiveness of the proposed modules. We further scale up the model to show that the performance of the SR task can be greatly improved. Besides, we extend HAT to more image restoration applications, including real-world image super-resolution, Gaussian image denoising and image compression artifacts reduction. Experiments on benchmark and real-world datasets demonstrate that our HAT achieves state-of-the-art performance both quantitatively and qualitatively. Codes and models are publicly available at https://github.com/XPixelGroup/HAT.

HAT: Hybrid Attention Transformer for Image Restoration

TL;DR

Abstract

Paper Structure (32 sections, 12 equations, 17 figures, 15 tables)

This paper contains 32 sections, 12 equations, 17 figures, 15 tables.

Introduction
Related Work
Image Super-Resolution
Vision Transformer
Deep Networks for Image Restoration
Motivation
An Overview of LAM
Interpretability Analysis
Feature Visualization
Methodology
Network Structure of HAT
Hybrid Attention Block
Overlapping Cross-Attention Block (OCAB)
The Same-task Pre-training
Discussions
...and 17 more sections

Figures (17)

Figure 1: Performance comparison of the proposed HAT on various image restoration tasks with the state-of-the-art methods.
Figure 2: LAM lam results of different networks. SwinIR utilizes less information compared to RCAN, while HAT uses the most pixels for reconstruction.
Figure 3: CEM cem results of different networks. Activating more input information for Transformer is crucial to the reconstruction performance.
Figure 4: Intermediate features visualization. "Layer N" means the intermediate features after the $N_{th}$ layer (i.e., RSTB in SwinIR and RHAG in HAT.)
Figure 5: The overall architecture of HAT and the structure of RHAG and HAB.
...and 12 more figures

HAT: Hybrid Attention Transformer for Image Restoration

TL;DR

Abstract

HAT: Hybrid Attention Transformer for Image Restoration

Authors

TL;DR

Abstract

Table of Contents

Figures (17)