Table of Contents
Fetching ...

Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

TL;DR

The paper addresses the efficiency gap in single image super-resolution by introducing EARFA, a lightweight model that replaces costly transformer attention with Entropy Attention (EA) and a shifting large kernel mechanism (SLKA). EA computes a Gaussian-conditioned differential entropy to gauge channel-wise feature informativeness with minimal overhead, while SLKA expands the receptive field by channel shifting and dilated convolutions. The authors demonstrate through extensive experiments that EARFA delivers competitive PSNR/SSIM with significantly lower latency than Transformer-based SISR models, and that EARFA-light achieves strong performance with a very small parameter count. This work offers a practical approach for real-time ESISR on constrained hardware, balancing reconstruction quality and inference speed.

Abstract

Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient SR model to mitigate the dilemma between model efficiency and SR performance, which is dubbed Entropy Attention and Receptive Field Augmentation network (EARFA), and composed of a novel entropy attention (EA) and a shifting large kernel attention (SLKA). From the perspective of information theory, EA increases the entropy of intermediate features conditioned on a Gaussian distribution, providing more informative input for subsequent reasoning. On the other hand, SLKA extends the receptive field of SR models with the assistance of channel shifting, which also favors to boost the diversity of hierarchical features. Since the implementation of EA and SLKA does not involve complex computations (such as extensive matrix multiplications), the proposed method can achieve faster nonlinear inference than Transformer-based SR models while maintaining better SR performance. Extensive experiments show that the proposed model can significantly reduce the delay of model inference while achieving the SR performance comparable with other advanced models.

Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

TL;DR

The paper addresses the efficiency gap in single image super-resolution by introducing EARFA, a lightweight model that replaces costly transformer attention with Entropy Attention (EA) and a shifting large kernel mechanism (SLKA). EA computes a Gaussian-conditioned differential entropy to gauge channel-wise feature informativeness with minimal overhead, while SLKA expands the receptive field by channel shifting and dilated convolutions. The authors demonstrate through extensive experiments that EARFA delivers competitive PSNR/SSIM with significantly lower latency than Transformer-based SISR models, and that EARFA-light achieves strong performance with a very small parameter count. This work offers a practical approach for real-time ESISR on constrained hardware, balancing reconstruction quality and inference speed.

Abstract

Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient SR model to mitigate the dilemma between model efficiency and SR performance, which is dubbed Entropy Attention and Receptive Field Augmentation network (EARFA), and composed of a novel entropy attention (EA) and a shifting large kernel attention (SLKA). From the perspective of information theory, EA increases the entropy of intermediate features conditioned on a Gaussian distribution, providing more informative input for subsequent reasoning. On the other hand, SLKA extends the receptive field of SR models with the assistance of channel shifting, which also favors to boost the diversity of hierarchical features. Since the implementation of EA and SLKA does not involve complex computations (such as extensive matrix multiplications), the proposed method can achieve faster nonlinear inference than Transformer-based SR models while maintaining better SR performance. Extensive experiments show that the proposed model can significantly reduce the delay of model inference while achieving the SR performance comparable with other advanced models.
Paper Structure (15 sections, 8 equations, 7 figures, 4 tables)

This paper contains 15 sections, 8 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of the tradeoff between SR results and model efficiency on Manga109 fujimoto2016manga109 with SR$\times$4. The diameter of each circle denotes the Multi-AddsAhn2018Fast of the corresponding model. Our EARFA achieves the best SR performance while keeping fast reasoning speed.
  • Figure 2: The overall structure of our EARFA. DAB constitutes the basic module for nonlinear inference, and LKAB and EAB are the building components of DAB that integrate SLKA and EA, respectively.
  • Figure 3: The architecture of SGFN. $\mathsf{1 \times 1}$ denotes a convolutional layer with a kernel size of 1$\times$1, and $\mathsf{Split}$ refers to splitting input features into two parts along the channel dimension, while $\mathsf{DW 5 \times 5}$ denotes a depth-wise convolution with a 5$\times$5 kernel size.
  • Figure 4: The network architecture of EA, where $\mathsf{1 \times 1}$ denotes the convolutional layer with a kernel size of 1$\times$1. Entropy signifies the computation of the differential entropy shannon1948mathematical for channel-wise features, and Sigmoid denotes the sigmoid function for weight normalization.
  • Figure 5: Pixel distribution of intermediate features. The left illustrates the pixel distribution of the input feature, while the right shows the pixel distribution of the features after adjustment of approaching to the Gaussian distribution.
  • ...and 2 more figures