Table of Contents
Fetching ...

Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim

TL;DR

The paper tackles the challenge of single image super-resolution by addressing two core limitations of prior approaches: inadequate modeling of long-range dependencies and underutilization of frequency-domain information. It introduces CPAT, a Channel-Partitioned Attention Transformer, featuring CPWin-SA to expand receptive fields via vertically and horizontally enhanced windows and OCAM to strengthen cross-window communication, alongside SFIM to fuse spatial and FFT-derived frequency features. The approach demonstrates consistent improvements over state-of-the-art methods across standard benchmarks, including up to 0.31dB gains on Urban100 at x2, while maintaining competitive computational costs. These contributions establish a practical framework for leveraging both spatial and frequency cues in high-quality SR, with strong potential for broader image restoration tasks.

Abstract

Recently, window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR). However, it may fall short in capturing long-range dependencies and relationships between distant tokens. Additionally, we find that learning on spatial domain does not convey the frequency content of the image, which is a crucial aspect in SISR. To tackle these issues, we propose a new Channel-Partitioned Attention Transformer (CPAT) to better capture long-range dependencies by sequentially expanding windows along the height and width of feature maps. In addition, we propose a novel Spatial-Frequency Interaction Module (SFIM), which incorporates information from spatial and frequency domains to provide a more comprehensive information from feature maps. This includes information about the frequency content and enhances the receptive field across the entire image. Experimental findings show the effectiveness of our proposed modules and architecture. In particular, CPAT surpasses current state-of-the-art methods by up to 0.31dB at x2 SR on Urban100.

Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

TL;DR

The paper tackles the challenge of single image super-resolution by addressing two core limitations of prior approaches: inadequate modeling of long-range dependencies and underutilization of frequency-domain information. It introduces CPAT, a Channel-Partitioned Attention Transformer, featuring CPWin-SA to expand receptive fields via vertically and horizontally enhanced windows and OCAM to strengthen cross-window communication, alongside SFIM to fuse spatial and FFT-derived frequency features. The approach demonstrates consistent improvements over state-of-the-art methods across standard benchmarks, including up to 0.31dB gains on Urban100 at x2, while maintaining competitive computational costs. These contributions establish a practical framework for leveraging both spatial and frequency cues in high-quality SR, with strong potential for broader image restoration tasks.

Abstract

Recently, window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR). However, it may fall short in capturing long-range dependencies and relationships between distant tokens. Additionally, we find that learning on spatial domain does not convey the frequency content of the image, which is a crucial aspect in SISR. To tackle these issues, we propose a new Channel-Partitioned Attention Transformer (CPAT) to better capture long-range dependencies by sequentially expanding windows along the height and width of feature maps. In addition, we propose a novel Spatial-Frequency Interaction Module (SFIM), which incorporates information from spatial and frequency domains to provide a more comprehensive information from feature maps. This includes information about the frequency content and enhances the receptive field across the entire image. Experimental findings show the effectiveness of our proposed modules and architecture. In particular, CPAT surpasses current state-of-the-art methods by up to 0.31dB at x2 SR on Urban100.
Paper Structure (13 sections, 12 equations, 4 figures, 6 tables)

This paper contains 13 sections, 12 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Architecture details. (a) The overall architecture of CPAT. (b) Structure of Channel-Partitioned Windowed Self-Attention. (c) Structure of Overlapping Cross-Attention Module. (d) Spatial-Frequency Integrated Module.
  • Figure 2: Enhanced window strategy and One-Direction Shift Operation in V-EWin and H-EWin
  • Figure 3: LAM gu2021interpreting and DI gu2021interpreting comparison results.
  • Figure 4: Qualitative comparison (x4 SR). The patch images being compared are the green boxes in the HR images. PSNR/SSIM is also computed correspondingly on these patches to demonstrate the improvement of our method.