Frequency-Integrated Transformer for Arbitrary-Scale Super-Resolution
Xufei Wang, Fei Ge, Jinchen Zhu, Mingjian Zhang, Qi Wu, Jifeng Ren Shizhuang Weng
TL;DR
The paper tackles arbitrary-scale single image super-resolution by integrating frequency information through a Frequency-Integrated Transformer (FIT). FIT comprises a Frequency Incorporation Module (FIM) for lossless frequency fusion via FFT-based real-imaginary mapping and a Frequency Utilization Self-Attention (FUSAM) that combines Interaction Implicit Self-Attention (IISA) and Frequency Correlation Self-Attention (FCSA) to leverage spatial-frequency interrelationships and global frequency context. Across DF2K/DIV2K and other standard benchmarks, FIT achieves state-of-the-art PSNR and qualitative results, supported by visualizations of frequency-aware detail enrichment, frequency fidelity improvements, and global-context capture. The approach demonstrates the practical value of explicitly exploiting the frequency domain in ASSR and paves the way for further adaptive, frequency-aware enhancements in related image restoration tasks.
Abstract
Methods based on implicit neural representation have demonstrated remarkable capabilities in arbitrary-scale super-resolution (ASSR) tasks, but they neglect the potential value of the frequency domain, leading to sub-optimal performance. We proposes a novel network called Frequency-Integrated Transformer (FIT) to incorporate and utilize frequency information to enhance ASSR performance. FIT employs Frequency Incorporation Module (FIM) to introduce frequency information in a lossless manner and Frequency Utilization Self-Attention module (FUSAM) to efficiently leverage frequency information by exploiting spatial-frequency interrelationship and global nature of frequency. FIM enriches detail characterization by incorporating frequency information through a combination of Fast Fourier Transform (FFT) with real-imaginary mapping. In FUSAM, Interaction Implicit Self-Attention (IISA) achieves cross-domain information synergy by interacting spatial and frequency information in subspace, while Frequency Correlation Self-attention (FCSA) captures the global context by computing correlation in frequency. Experimental results demonstrate FIT yields superior performance compared to existing methods across multiple benchmark datasets. Visual feature map proves the superiority of FIM in enriching detail characterization. Frequency error map validates IISA productively improve the frequency fidelity. Local attribution map validates FCSA effectively captures global context.
