Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation
Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li
TL;DR
The paper addresses the efficiency gap in single image super-resolution by introducing EARFA, a lightweight model that replaces costly transformer attention with Entropy Attention (EA) and a shifting large kernel mechanism (SLKA). EA computes a Gaussian-conditioned differential entropy to gauge channel-wise feature informativeness with minimal overhead, while SLKA expands the receptive field by channel shifting and dilated convolutions. The authors demonstrate through extensive experiments that EARFA delivers competitive PSNR/SSIM with significantly lower latency than Transformer-based SISR models, and that EARFA-light achieves strong performance with a very small parameter count. This work offers a practical approach for real-time ESISR on constrained hardware, balancing reconstruction quality and inference speed.
Abstract
Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient SR model to mitigate the dilemma between model efficiency and SR performance, which is dubbed Entropy Attention and Receptive Field Augmentation network (EARFA), and composed of a novel entropy attention (EA) and a shifting large kernel attention (SLKA). From the perspective of information theory, EA increases the entropy of intermediate features conditioned on a Gaussian distribution, providing more informative input for subsequent reasoning. On the other hand, SLKA extends the receptive field of SR models with the assistance of channel shifting, which also favors to boost the diversity of hierarchical features. Since the implementation of EA and SLKA does not involve complex computations (such as extensive matrix multiplications), the proposed method can achieve faster nonlinear inference than Transformer-based SR models while maintaining better SR performance. Extensive experiments show that the proposed model can significantly reduce the delay of model inference while achieving the SR performance comparable with other advanced models.
