Table of Contents
Fetching ...

Efficient Learnable Collaborative Attention for Single Image Super-Resolution

Yigang Zhao Chaowei Zheng, Jiannan Su, GuangyongChen, MinGan

TL;DR

The paper addresses the heavy computational burden of non-local attention in single-image super-resolution by introducing Learnable Collaborative Attention (LCoA). LCoA combines a Learnable Sparse Pattern (LSP), built on k-means clustering to create data-driven sparse attention, with Collaborative Attention (CoA), which shares attention weights across network layers to reduce redundant computations. Empirical results show substantial inference-time reductions (up to about 83%) and memory savings while maintaining competitive PSNR/SSIM against state-of-the-art SR methods, demonstrated on standard benchmarks with diverse scale factors. The proposed Learnable Sparse Pattern and weight-sharing strategy enable efficient long-range dependency modeling, yielding a deep Learnable Collaborative Attention Network (LCoAN) that balances accuracy and efficiency for practical SR applications.

Abstract

Non-Local Attention (NLA) is a powerful technique for capturing long-range feature correlations in deep single image super-resolution (SR). However, NLA suffers from high computational complexity and memory consumption, as it requires aggregating all non-local feature information for each query response and recalculating the similarity weight distribution for different abstraction levels of features. To address these challenges, we propose a novel Learnable Collaborative Attention (LCoA) that introduces inductive bias into non-local modeling. Our LCoA consists of two components: Learnable Sparse Pattern (LSP) and Collaborative Attention (CoA). LSP uses the k-means clustering algorithm to dynamically adjust the sparse attention pattern of deep features, which reduces the number of non-local modeling rounds compared with existing sparse solutions. CoA leverages the sparse attention pattern and weights learned by LSP, and co-optimizes the similarity matrix across different abstraction levels, which avoids redundant similarity matrix calculations. The experimental results show that our LCoA can reduce the non-local modeling time by about 83% in the inference stage. In addition, we integrate our LCoA into a deep Learnable Collaborative Attention Network (LCoAN), which achieves competitive performance in terms of inference time, memory consumption, and reconstruction quality compared with other state-of-the-art SR methods.

Efficient Learnable Collaborative Attention for Single Image Super-Resolution

TL;DR

The paper addresses the heavy computational burden of non-local attention in single-image super-resolution by introducing Learnable Collaborative Attention (LCoA). LCoA combines a Learnable Sparse Pattern (LSP), built on k-means clustering to create data-driven sparse attention, with Collaborative Attention (CoA), which shares attention weights across network layers to reduce redundant computations. Empirical results show substantial inference-time reductions (up to about 83%) and memory savings while maintaining competitive PSNR/SSIM against state-of-the-art SR methods, demonstrated on standard benchmarks with diverse scale factors. The proposed Learnable Sparse Pattern and weight-sharing strategy enable efficient long-range dependency modeling, yielding a deep Learnable Collaborative Attention Network (LCoAN) that balances accuracy and efficiency for practical SR applications.

Abstract

Non-Local Attention (NLA) is a powerful technique for capturing long-range feature correlations in deep single image super-resolution (SR). However, NLA suffers from high computational complexity and memory consumption, as it requires aggregating all non-local feature information for each query response and recalculating the similarity weight distribution for different abstraction levels of features. To address these challenges, we propose a novel Learnable Collaborative Attention (LCoA) that introduces inductive bias into non-local modeling. Our LCoA consists of two components: Learnable Sparse Pattern (LSP) and Collaborative Attention (CoA). LSP uses the k-means clustering algorithm to dynamically adjust the sparse attention pattern of deep features, which reduces the number of non-local modeling rounds compared with existing sparse solutions. CoA leverages the sparse attention pattern and weights learned by LSP, and co-optimizes the similarity matrix across different abstraction levels, which avoids redundant similarity matrix calculations. The experimental results show that our LCoA can reduce the non-local modeling time by about 83% in the inference stage. In addition, we integrate our LCoA into a deep Learnable Collaborative Attention Network (LCoAN), which achieves competitive performance in terms of inference time, memory consumption, and reconstruction quality compared with other state-of-the-art SR methods.
Paper Structure (16 sections, 12 equations, 11 figures, 2 tables)

This paper contains 16 sections, 12 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: The structure of our Learnable Collaborative Attention Network (LCoAN). The LCoAN is built upon a deep residual network that incorporates Learnable Sparse Pattern (LSP) and the Collaborative Attention (CoA), and the sparsity pattern and attention weights optimized by the LSP are co-optimized by all CoA.
  • Figure 2: The impact of the proposed LSP and CoA on memory consumption and inference time on Urban100 ($\times 2$).
  • Figure 3: Ablation experiments conducted on Set14 with scale factor 2 to explore the advantages of LCoA. (a) The PSNR results of replacing k-means with LSH. (b) The comparison result of different attentions in terms of performance and efficiency.
  • Figure 4: Ablation experiments conducted on Set14 with scale factor 2 to explore the effects of cluster and window size. (a) The PSNR results from different cluster settings. (b) The PSNR results from different window size settings.
  • Figure 5: The distribution overlap rate between shallow attention maps and deep attention maps. We can observe that the distribution of shallow attention maps mainly focuses on areas similar to the query texture, while deep attention maps tend to be more randomly distributed.
  • ...and 6 more figures