HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention
Yi Liu, Yi Wan, Xinyi Liu, Qiong Wu, Panwang Xia, Xuejun Huang, Yongjun Zhang
TL;DR
The paper tackles real-time remote sensing image super-resolution by introducing HIMOSA, a lightweight framework that uses a content-aware sparse attention mechanism and hierarchical window expansion to efficiently model multi-scale patterns and long-range dependencies. By adopting a mixture-of-experts-inspired routing, CARSA selectively processes tokens within large windows, while CAB and ConvGLU enhance global information flow and nonlinear feature modulation. Extensive experiments across multiple RS datasets demonstrate state-of-the-art reconstruction quality with favorable efficiency, including a faster HIMOSA-light variant. The approach offers practical impact for time-sensitive remote sensing applications such as disaster response, where both accuracy and speed are critical.
Abstract
In remote sensing applications, such as disaster detection and response, real-time efficiency and model lightweighting are of critical importance. Consequently, existing remote sensing image super-resolution methods often face a trade-off between model performance and computational efficiency. In this paper, we propose a lightweight super-resolution framework for remote sensing imagery, named HIMOSA. Specifically, HIMOSA leverages the inherent redundancy in remote sensing imagery and introduces a content-aware sparse attention mechanism, enabling the model to achieve fast inference while maintaining strong reconstruction performance. Furthermore, to effectively leverage the multi-scale repetitive patterns found in remote sensing imagery, we introduce a hierarchical window expansion and reduce the computational complexity by adjusting the sparsity of the attention. Extensive experiments on multiple remote sensing datasets demonstrate that our method achieves state-of-the-art performance while maintaining computational efficiency.
