SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

Yi Wang; Nassim Ait Ali Braham; Zhitong Xiong; Chenying Liu; Conrad M Albrecht; Xiao Xiang Zhu

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

Yi Wang, Nassim Ait Ali Braham, Zhitong Xiong, Chenying Liu, Conrad M Albrecht, Xiao Xiang Zhu

TL;DR

SSL4EO-S12 provides a scalable, multimodal, multi-temporal unlabeled dataset for self-supervised learning in Earth observation, leveraging Sentinel-1/2 data to enable in-domain pre-training. The authors benchmark MoCo-v2, DINO, MAE, and data2vec, showing consistent improvements over supervised baselines and existing RS pre-training datasets across classification, segmentation, and change-detection tasks. Ablation studies highlight the value of multimodality, seasonal information, atmospheric correction augmentation, and scaling, with t-SNE visualizations confirming meaningful, class-oriented representations. The dataset and accompanying code are openly available, offering a substantial resource for advancing SSL in EO and facilitating transfer to diverse RS applications. Overall, SSL4EO-S12 demonstrates that large-scale, EO-specific SSL can achieve competitive downstream performance and generalize across multiple RS benchmarks.

Abstract

Self-supervised pre-training bears potential to generate expressive representations without human annotation. Most pre-training in Earth observation (EO) are based on ImageNet or medium-size, labeled remote sensing (RS) datasets. We share an unlabeled RS dataset SSL4EO-S12 (Self-Supervised Learning for Earth Observation - Sentinel-1/2) to assemble a large-scale, global, multimodal, and multi-seasonal corpus of satellite imagery from the ESA Sentinel-1 \& -2 satellite missions. For EO applications we demonstrate SSL4EO-S12 to succeed in self-supervised pre-training for a set of methods: MoCo-v2, DINO, MAE, and data2vec. Resulting models yield downstream performance close to, or surpassing accuracy measures of supervised learning. In addition, pre-training on SSL4EO-S12 excels compared to existing datasets. We make openly available the dataset, related source code, and pre-trained models at https://github.com/zhu-xlab/SSL4EO-S12.

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

TL;DR

Abstract

Paper Structure (55 sections, 7 figures, 20 tables)

This paper contains 55 sections, 7 figures, 20 tables.

Introduction
Related work
SSL4EO-S12 Dataset
Data curation & assembly
Data characteristics & volume
Experimental setup
Self-supervised pre-training
Transfer learning
Benchmark results
Classification
Comparison of SSL methods
Comparison of pre-training datasets
Comparison of different amounts of labels
Segmentation
Land cover segmentation
...and 40 more sections

Figures (7)

Figure 1: Sample images of SSL4EO-S12 dataset assembled.
Figure 2: Geographical distribution of SSL4EO-S12 dataset.
Figure 3: Image patches without (left) and with (right) overlap filtering in Tokyo metropolitan area. We plot red circles of radius 1.32km (132 pixels) for better visualization.
Figure 4: BigEarthNet (BE) performance depending on amount of labels available to train downstream task. We report linear probing and fine-tuning results with ResNet50 and ViT-S/16 encoders pre-trained using MoCo-v2.
Figure 5: t-SNE visualization of EuroSAT image representations. One color represents one class. Left: random-encoded features; right: SSL-encoded features. SSL-encoded features are well clustered even without label information.
...and 2 more figures

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

TL;DR

Abstract

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)