Table of Contents
Fetching ...

HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention

Yi Liu, Yi Wan, Xinyi Liu, Qiong Wu, Panwang Xia, Xuejun Huang, Yongjun Zhang

TL;DR

The paper tackles real-time remote sensing image super-resolution by introducing HIMOSA, a lightweight framework that uses a content-aware sparse attention mechanism and hierarchical window expansion to efficiently model multi-scale patterns and long-range dependencies. By adopting a mixture-of-experts-inspired routing, CARSA selectively processes tokens within large windows, while CAB and ConvGLU enhance global information flow and nonlinear feature modulation. Extensive experiments across multiple RS datasets demonstrate state-of-the-art reconstruction quality with favorable efficiency, including a faster HIMOSA-light variant. The approach offers practical impact for time-sensitive remote sensing applications such as disaster response, where both accuracy and speed are critical.

Abstract

In remote sensing applications, such as disaster detection and response, real-time efficiency and model lightweighting are of critical importance. Consequently, existing remote sensing image super-resolution methods often face a trade-off between model performance and computational efficiency. In this paper, we propose a lightweight super-resolution framework for remote sensing imagery, named HIMOSA. Specifically, HIMOSA leverages the inherent redundancy in remote sensing imagery and introduces a content-aware sparse attention mechanism, enabling the model to achieve fast inference while maintaining strong reconstruction performance. Furthermore, to effectively leverage the multi-scale repetitive patterns found in remote sensing imagery, we introduce a hierarchical window expansion and reduce the computational complexity by adjusting the sparsity of the attention. Extensive experiments on multiple remote sensing datasets demonstrate that our method achieves state-of-the-art performance while maintaining computational efficiency.

HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention

TL;DR

The paper tackles real-time remote sensing image super-resolution by introducing HIMOSA, a lightweight framework that uses a content-aware sparse attention mechanism and hierarchical window expansion to efficiently model multi-scale patterns and long-range dependencies. By adopting a mixture-of-experts-inspired routing, CARSA selectively processes tokens within large windows, while CAB and ConvGLU enhance global information flow and nonlinear feature modulation. Extensive experiments across multiple RS datasets demonstrate state-of-the-art reconstruction quality with favorable efficiency, including a faster HIMOSA-light variant. The approach offers practical impact for time-sensitive remote sensing applications such as disaster response, where both accuracy and speed are critical.

Abstract

In remote sensing applications, such as disaster detection and response, real-time efficiency and model lightweighting are of critical importance. Consequently, existing remote sensing image super-resolution methods often face a trade-off between model performance and computational efficiency. In this paper, we propose a lightweight super-resolution framework for remote sensing imagery, named HIMOSA. Specifically, HIMOSA leverages the inherent redundancy in remote sensing imagery and introduces a content-aware sparse attention mechanism, enabling the model to achieve fast inference while maintaining strong reconstruction performance. Furthermore, to effectively leverage the multi-scale repetitive patterns found in remote sensing imagery, we introduce a hierarchical window expansion and reduce the computational complexity by adjusting the sparsity of the attention. Extensive experiments on multiple remote sensing datasets demonstrate that our method achieves state-of-the-art performance while maintaining computational efficiency.

Paper Structure

This paper contains 20 sections, 6 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: Different visual patterns in remote sensing imagery. (a) multi-scale repetitive patterns; (b) repetitive patterns; (c) weak texture.
  • Figure 2: The overall architecture of HIMOSA. Each HIMOSA block contains $M$ hierarchical layers, each of which includes a content-aware routing sparse attention (CARSA), a channel attention module (CA) and a convolutional gated linear unit (ConvGLU).
  • Figure 3: Hierarchical window for sparse attention. Increasing window sizes are applied to different hierarchical layers to aggregate features with expanding receptive fields.
  • Figure 4: The structure of: (a) Channel attention; (b) CARSA; (c) ConvGLU.
  • Figure 5: Visualization results ($\times 4$) achieved by different methods in DOTA datasets.
  • ...and 6 more figures