Table of Contents
Fetching ...

HAAT: Hybrid Attention Aggregation Transformer for Image Super-Resolution

Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kai-wen Xue, Kin-Man Lam

TL;DR

This work tackles the limitations of window-limited self-attention in transformer-based image super-resolution by introducing HAAT, a Hybrid Attention Aggregation Transformer that jointly leverages Swin-Dense-Residual-Connected Blocks (SDRCB) and Hybrid Grid Attention Blocks (HGAB). SDRCB expands receptive fields through dense residual connections within Swin Transformer layers, while HGAB integrates channel attention, sparse self-attention, and window-based attention via a Mix Attention Layer to enable efficient, global feature fusion. The paper details the architectural components and training setup, and demonstrates superior performance over state-of-the-art methods on standard SR benchmarks across scales ×2, ×3, and ×4. The contributions offer a practical pathway to higher-quality SR outputs by enhancing both local and nonlocal feature modeling for diverse image textures.

Abstract

In the research area of image super-resolution, Swin-transformer-based models are favored for their global spatial modeling and shifting window attention mechanism. However, existing methods often limit self-attention to non overlapping windows to cut costs and ignore the useful information that exists across channels. To address this issue, this paper introduces a novel model, the Hybrid Attention Aggregation Transformer (HAAT), designed to better leverage feature information. HAAT is constructed by integrating Swin-Dense-Residual-Connected Blocks (SDRCB) with Hybrid Grid Attention Blocks (HGAB). SDRCB expands the receptive field while maintaining a streamlined architecture, resulting in enhanced performance. HGAB incorporates channel attention, sparse attention, and window attention to improve nonlocal feature fusion and achieve more visually compelling results. Experimental evaluations demonstrate that HAAT surpasses state-of-the-art methods on benchmark datasets. Keywords: Image super-resolution, Computer vision, Attention mechanism, Transformer

HAAT: Hybrid Attention Aggregation Transformer for Image Super-Resolution

TL;DR

This work tackles the limitations of window-limited self-attention in transformer-based image super-resolution by introducing HAAT, a Hybrid Attention Aggregation Transformer that jointly leverages Swin-Dense-Residual-Connected Blocks (SDRCB) and Hybrid Grid Attention Blocks (HGAB). SDRCB expands receptive fields through dense residual connections within Swin Transformer layers, while HGAB integrates channel attention, sparse self-attention, and window-based attention via a Mix Attention Layer to enable efficient, global feature fusion. The paper details the architectural components and training setup, and demonstrates superior performance over state-of-the-art methods on standard SR benchmarks across scales ×2, ×3, and ×4. The contributions offer a practical pathway to higher-quality SR outputs by enhancing both local and nonlocal feature modeling for diverse image textures.

Abstract

In the research area of image super-resolution, Swin-transformer-based models are favored for their global spatial modeling and shifting window attention mechanism. However, existing methods often limit self-attention to non overlapping windows to cut costs and ignore the useful information that exists across channels. To address this issue, this paper introduces a novel model, the Hybrid Attention Aggregation Transformer (HAAT), designed to better leverage feature information. HAAT is constructed by integrating Swin-Dense-Residual-Connected Blocks (SDRCB) with Hybrid Grid Attention Blocks (HGAB). SDRCB expands the receptive field while maintaining a streamlined architecture, resulting in enhanced performance. HGAB incorporates channel attention, sparse attention, and window attention to improve nonlocal feature fusion and achieve more visually compelling results. Experimental evaluations demonstrate that HAAT surpasses state-of-the-art methods on benchmark datasets. Keywords: Image super-resolution, Computer vision, Attention mechanism, Transformer

Paper Structure

This paper contains 6 sections, 9 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: SDRCB Framework.
  • Figure 2: The structure of HGAB.