RealViformer: Investigating Attention for Real-World Video Super-Resolution
Yuehan Zhang, Angela Yao
TL;DR
RealViformer investigates attention in real-world video super-resolution by comparing covariance-based spatial and channel attentions under real degradations. It demonstrates that channel attention is more robust to artifact-laden queries but tends to increase output-channel covariance, which can hinder learning; this is mitigated with the CAF and ICA modules that incorporate squeeze-and-excite and covariance-based rescaling. The model uses a unidirectional recurrent Transformer with CAF for temporal fusion and ICA for enhanced channel processing, achieving state-of-the-art results with fewer parameters and faster runtimes on multiple real-world and synthetic datasets. This work provides practical guidance on attention design for RWVSR and introduces design patterns to control channel redundancy, offering a path toward more reliable real-world video enhancement systems.
Abstract
In real-world video super-resolution (VSR), videos suffer from in-the-wild degradations and artifacts. VSR methods, especially recurrent ones, tend to propagate artifacts over time in the real-world setting and are more vulnerable than image super-resolution. This paper investigates the influence of artifacts on commonly used covariance-based attention mechanisms in VSR. Comparing the widely-used spatial attention, which computes covariance over space, versus the channel attention, we observe that the latter is less sensitive to artifacts. However, channel attention leads to feature redundancy, as evidenced by the higher covariance among output channels. As such, we explore simple techniques such as the squeeze-excite mechanism and covariance-based rescaling to counter the effects of high channel covariance. Based on our findings, we propose RealViformer. This channel-attention-based real-world VSR framework surpasses state-of-the-art on two real-world VSR datasets with fewer parameters and faster runtimes. The source code is available at https://github.com/Yuehan717/RealViformer.
