ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

Alik Pramanick; Utsav Bheda; Arijit Sur

ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

Alik Pramanick, Utsav Bheda, Arijit Sur

TL;DR

This work introduces ML-CrAIST, a transformer-based single-image super-resolution architecture that jointly exploits multi-scale low/high-frequency information through 2D Discrete Wavelet Transform and cross-frequency cross-attention. Core components include the spatial-channel attention-based transformer block (SCATB), the low-high frequency interaction block (LHFIB) with an attention-based fusion block (AFB), and cross attention blocks (CAB) for cross-scale and cross-frequency message passing. Empirical results on five standard SR benchmarks show state-of-the-art PSNR/SSIM and favorable perceptual metrics, with notable gains such as $+0.20$ dB on Manga109 ×3 and competitive FLOPs, including a lighter variant (Ours-Li) with ~1.5× fewer FLOPs. The method also yields practical benefits in downstream tasks like keypoint detection and edge detection, validating its broader applicability in image restoration and analysis.

Abstract

Recently, transformers have captured significant interest in the area of single-image super-resolution tasks, demonstrating substantial gains in performance. Current models heavily depend on the network's extensive ability to extract high-level semantic details from images while overlooking the effective utilization of multi-scale image details and intermediate information within the network. Furthermore, it has been observed that high-frequency areas in images present significant complexity for super-resolution compared to low-frequency areas. This work proposes a transformer-based super-resolution architecture called ML-CrAIST that addresses this gap by utilizing low-high frequency information in multiple scales. Unlike most of the previous work (either spatial or channel), we operate spatial and channel self-attention, which concurrently model pixel interaction from both spatial and channel dimensions, exploiting the inherent correlations across spatial and channel axis. Further, we devise a cross-attention block for super-resolution, which explores the correlations between low and high-frequency information. Quantitative and qualitative assessments indicate that our proposed ML-CrAIST surpasses state-of-the-art super-resolution methods (e.g., 0.15 dB gain @Manga109 $\times$4). Code is available on: https://github.com/Alik033/ML-CrAIST.

ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

TL;DR

dB on Manga109 ×3 and competitive FLOPs, including a lighter variant (Ours-Li) with ~1.5× fewer FLOPs. The method also yields practical benefits in downstream tasks like keypoint detection and edge detection, validating its broader applicability in image restoration and analysis.

Abstract

4). Code is available on: https://github.com/Alik033/ML-CrAIST.

Paper Structure (15 sections, 8 equations, 4 figures, 1 table)

This paper contains 15 sections, 8 equations, 4 figures, 1 table.

Introduction
Related Work
Proposed Method
Overall Pipeline
Spatial-channel attention-based transformer block (SCATB)
Low-High Frequency Interaction Block (LHFIB)
Attention-based fusion block (AFB)
Cross Attention Block (CAB)
Experiments
Datasets & Evaluation Metrics
Implementation Details
Comparisons with the SOTA
Ablation Study
Impact on various application
Conclusion

Figures (4)

Figure 1: (a) Multi-level wavelet sub-bands of a LR image. (b) Overview of the Proposed ML-CrAIST. $N\times$ indicates that the block is stacked N times.
Figure 5: (a) Visual comparison of different settings of ML-CrAIST. (b) Convergence graph of ML-CrAIST.
Figure 6: LPIPS ($\downarrow$), BRISQUE ($\downarrow$), and EPI comparison bettween different components of ML-CrAIST. $\downarrow$ indicates lower is better.
Figure 7: Key-point and canny edge detection comparison between existing methods and ML-CrAIST. The top corner of the first row indicates the number of key points.

ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

TL;DR

Abstract

ML-CrAIST: Multi-scale Low-high Frequency Information-based Cross black Attention with Image Super-resolving Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (4)