Self-supervised Learning in Remote Sensing: A Review
Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, Xiao Xiang Zhu
TL;DR
This work surveys self-supervised learning for remote sensing, organizing methods into generative, predictive, and contrastive families and mapping CV advances to earth observation contexts. It discusses RS-specific data characteristics, proposes a taxonomy of pretext tasks across spatial, spectral, temporal, and multi-sensor contexts, and catalogs a suite of RS SSL applications. A preliminary benchmark on BigEarthNet, SEN12MS, and So2Sat-LCZ42 evaluates four representative contrastive methods, revealing robust RS representations with MoCo-based approaches and highlighting the importance of data augmentation and label efficiency. The authors also identify challenges (e.g., model collapse, augmentation design, multimodal integration) and outline SSL4EO directions to bridge CV and RS communities for scalable, label-free representation learning in Earth observation.
Abstract
In deep learning research, self-supervised learning (SSL) has received great attention triggering interest within both the computer vision and remote sensing communities. While there has been a big success in computer vision, most of the potential of SSL in the domain of earth observation remains locked. In this paper, we provide an introduction to, and a review of the concepts and latest developments in SSL for computer vision in the context of remote sensing. Further, we provide a preliminary benchmark of modern SSL algorithms on popular remote sensing datasets, verifying the potential of SSL in remote sensing and providing an extended study on data augmentations. Finally, we identify a list of promising directions of future research in SSL for earth observation (SSL4EO) to pave the way for fruitful interaction of both domains.
