MIMRS: A Survey on Masked Image Modeling in Remote Sensing
Shabnam Choudhury, Akhil Vasim, Michael Schmitt, Biplab Banerjee
TL;DR
The paper addresses the challenge that remote sensing (RS) data are plentiful but labeling is costly, and ImageNet-based pretraining often fails to transfer due to RS domain disparities. It surveys Masked Image Modeling (MIM) as a self-supervised approach that reconstructs masked image parts, with MAE and related variants enabling efficient pretraining of Vision Transformers (ViTs) on RS data. The review covers RS-specific adaptations, including multi-spectral and temporal masking, rotational/variable-size window attention, and multimodal pretraining, along with objective functions such as $L_{InfoNCE}$ for contrastive learning and $\\mathcal{L}_{MSE}$ for reconstruction. These methods enable robust RS representations for downstream tasks like scene classification, object detection, and semantic segmentation, reducing labeling needs and supporting cross-modal fusion. The work highlights future directions toward integrated generative-discriminative architectures and scalable, geometry-aware pretraining to further advance RS understanding and applications.
Abstract
Masked Image Modeling (MIM) is a self-supervised learning technique that involves masking portions of an image, such as pixels, patches, or latent representations, and training models to predict the missing information using the visible context. This approach has emerged as a cornerstone in self-supervised learning, unlocking new possibilities in visual understanding by leveraging unannotated data for pre-training. In remote sensing, MIM addresses challenges such as incomplete data caused by cloud cover, occlusions, and sensor limitations, enabling applications like cloud removal, multi-modal data fusion, and super-resolution. By synthesizing and critically analyzing recent advancements, this survey (MIMRS) is a pioneering effort to chart the landscape of mask image modeling in remote sensing. We highlight state-of-the-art methodologies, applications, and future research directions, providing a foundational review to guide innovation in this rapidly evolving field.
