CAMixerSR: Only Details Need More "Attention"
Yan Wang, Yi Liu, Shijie Zhao, Junlin Li, Li Zhang
TL;DR
This work addresses the challenge of high-quality SR on very large images by unifying two dominant strategies: content-aware routing and advanced token mixers. It introduces CAMixer, a content-aware mixer that uses a predictor to allocate computation between convolution and deformable window-attention, guided by offsets $\Delta p$, a mixer mask $m$, and spatial/channel attentions, with a global classification loss to sharpen partitioning. Stacking CAMixers yields CAMixerSR, which delivers state-of-the-art quality-efficiency trade-offs across large-image SR, lightweight SR, and omnidirectional SR, outperforming several baselines while using fewer computations. The approach demonstrates practical impact for high-resolution SR and shows potential for integration with existing acceleration frameworks to further optimize performance and efficiency.
Abstract
To satisfy the rapidly increasing demands on the large image (2K-8K) super-resolution (SR), prevailing methods follow two independent tracks: 1) accelerate existing networks by content-aware routing, and 2) design better super-resolution networks via token mixer refining. Despite directness, they encounter unavoidable defects (e.g., inflexible route or non-discriminative processing) limiting further improvements of quality-complexity trade-off. To erase the drawbacks, we integrate these schemes by proposing a content-aware mixer (CAMixer), which assigns convolution for simple contexts and additional deformable window-attention for sparse textures. Specifically, the CAMixer uses a learnable predictor to generate multiple bootstraps, including offsets for windows warping, a mask for classifying windows, and convolutional attentions for endowing convolution with the dynamic property, which modulates attention to include more useful textures self-adaptively and improves the representation capability of convolution. We further introduce a global classification loss to improve the accuracy of predictors. By simply stacking CAMixers, we obtain CAMixerSR which achieves superior performance on large-image SR, lightweight SR, and omnidirectional-image SR.
