Table of Contents
Fetching ...

Crafting Query-Aware Selective Attention for Single Image Super-Resolution

Junyoung Kim, Youngrok Kim, Siyeol Jung, Donghyun Min

TL;DR

SSCAN tackles single image super-resolution by introducing a query-aware selective attention mechanism that focuses computation on regions most relevant to reconstruction. The core component, FGCA, selects the top-$k$ windows based on query-key similarity and applies attention only to those regions, yielding linear-like complexity with respect to image size when combined with fixed windows and FlashAttention. Empirical results show SSCAN outperforms existing selective-attention SR models (up to 0.14 dB PSNR on Urban100) while maintaining comparable parameter counts, and memory analyses indicate substantial reductions in footprint. These findings suggest a practical, scalable path for high-quality SR on large images, suitable for resource-constrained settings and on-device deployment.

Abstract

Single Image Super-Resolution (SISR) reconstructs high-resolution images from low-resolution inputs, enhancing image details. While Vision Transformer (ViT)-based models improve SISR by capturing long-range dependencies, they suffer from quadratic computational costs or employ selective attention mechanisms that do not explicitly focus on query-relevant regions. Despite these advancements, prior work has overlooked how selective attention mechanisms should be effectively designed for SISR. We propose SSCAN, which dynamically selects the most relevant key-value windows based on query similarity, ensuring focused feature extraction while maintaining efficiency. In contrast to prior approaches that apply attention globally or heuristically, our method introduces a query-aware window selection strategy that better aligns attention computation with important image regions. By incorporating fixed-sized windows, SSCAN reduces memory usage and enforces linear token-to-token complexity, making it scalable for large images. Our experiments demonstrate that SSCAN outperforms existing attention-based SISR methods, achieving up to 0.14 dB PSNR improvement on urban datasets, guaranteeing both computational efficiency and reconstruction quality in SISR.

Crafting Query-Aware Selective Attention for Single Image Super-Resolution

TL;DR

SSCAN tackles single image super-resolution by introducing a query-aware selective attention mechanism that focuses computation on regions most relevant to reconstruction. The core component, FGCA, selects the top- windows based on query-key similarity and applies attention only to those regions, yielding linear-like complexity with respect to image size when combined with fixed windows and FlashAttention. Empirical results show SSCAN outperforms existing selective-attention SR models (up to 0.14 dB PSNR on Urban100) while maintaining comparable parameter counts, and memory analyses indicate substantial reductions in footprint. These findings suggest a practical, scalable path for high-quality SR on large images, suitable for resource-constrained settings and on-device deployment.

Abstract

Single Image Super-Resolution (SISR) reconstructs high-resolution images from low-resolution inputs, enhancing image details. While Vision Transformer (ViT)-based models improve SISR by capturing long-range dependencies, they suffer from quadratic computational costs or employ selective attention mechanisms that do not explicitly focus on query-relevant regions. Despite these advancements, prior work has overlooked how selective attention mechanisms should be effectively designed for SISR. We propose SSCAN, which dynamically selects the most relevant key-value windows based on query similarity, ensuring focused feature extraction while maintaining efficiency. In contrast to prior approaches that apply attention globally or heuristically, our method introduces a query-aware window selection strategy that better aligns attention computation with important image regions. By incorporating fixed-sized windows, SSCAN reduces memory usage and enforces linear token-to-token complexity, making it scalable for large images. Our experiments demonstrate that SSCAN outperforms existing attention-based SISR methods, achieving up to 0.14 dB PSNR improvement on urban datasets, guaranteeing both computational efficiency and reconstruction quality in SISR.

Paper Structure

This paper contains 17 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The comparisons of recent transformer-based SR models in terms of PSNR and parameters on Urban100 dataset. Our model (SSCAN) outperforms the SOTA models (×4) by up to 0.14dB in PSNR.
  • Figure 2: Visualization of our proposed Fine-Grained Context-Aware Attention (FGCA) on Urban100 dataset: Red boxes indicate query regions, while white boxes denote corresponding key regions.
  • Figure 3: The overall architecture of SSCAN model and the composition of SSCAN block. Each SSCAN block has two FGCA blocks in this illustration.
  • Figure 4: Comparison of memory consumption between BiFormer's attention (BRA) and our fine-grained context-aware attention during calculating attention score.
  • Figure 5: Visual comparisons of SSCAN with other publicly released SISR models on Urban100 dataset (×4).