Table of Contents
Fetching ...

Is Self-Supervised Pre-training on Satellite Imagery Better than ImageNet? A Systematic Study with Sentinel-2

Saad Lahrichi, Zion Sheng, Shufan Xia, Kyle Bradbury, Jordan Malof

TL;DR

This study systematically compares self-supervised pretraining on domain-aligned Sentinel-2 data (GeoNet) with ImageNet pretraining for downstream remote sensing tasks. By pretraining SwAV and MAE on GeoNet and ImageNet and evaluating with linear probing and fine-tuning on six RS benchmarks in few-shot settings, the authors find only modest, non-consistent gains from RS-domain pretraining. They further analyze GeoNet composition, two-stage domain-adaptive pretraining, and reconstruction-based proxies, concluding that RS SSL advantages are not universal and may not justify the extra data curation and compute. The work provides practical guidance for RS practitioners regarding the trade-offs of RS-specific SSL pretraining and highlights avenues for future work, including multi-band RS data and alternative SSL designs.

Abstract

Self-supervised learning (SSL) has demonstrated significant potential in pre-training robust models with limited labeled data, making it particularly valuable for remote sensing (RS) tasks. A common assumption is that pre-training on domain-aligned data provides maximal benefits on downstream tasks, particularly when compared to ImageNet-pretraining (INP). In this work, we investigate this assumption by collecting GeoNet, a large and diverse dataset of global optical Sentinel-2 imagery, and pre-training SwAV and MAE on both GeoNet and ImageNet. Evaluating these models on six downstream tasks in the few-shot setting reveals that SSL pre-training on RS data offers modest performance improvements over INP, and that it remains competitive in multiple scenarios. This indicates that the presumed benefits of SSL pre-training on RS data may be overstated, and the additional costs of data curation and pre-training could be unjustified.

Is Self-Supervised Pre-training on Satellite Imagery Better than ImageNet? A Systematic Study with Sentinel-2

TL;DR

This study systematically compares self-supervised pretraining on domain-aligned Sentinel-2 data (GeoNet) with ImageNet pretraining for downstream remote sensing tasks. By pretraining SwAV and MAE on GeoNet and ImageNet and evaluating with linear probing and fine-tuning on six RS benchmarks in few-shot settings, the authors find only modest, non-consistent gains from RS-domain pretraining. They further analyze GeoNet composition, two-stage domain-adaptive pretraining, and reconstruction-based proxies, concluding that RS SSL advantages are not universal and may not justify the extra data curation and compute. The work provides practical guidance for RS practitioners regarding the trade-offs of RS-specific SSL pretraining and highlights avenues for future work, including multi-band RS data and alternative SSL designs.

Abstract

Self-supervised learning (SSL) has demonstrated significant potential in pre-training robust models with limited labeled data, making it particularly valuable for remote sensing (RS) tasks. A common assumption is that pre-training on domain-aligned data provides maximal benefits on downstream tasks, particularly when compared to ImageNet-pretraining (INP). In this work, we investigate this assumption by collecting GeoNet, a large and diverse dataset of global optical Sentinel-2 imagery, and pre-training SwAV and MAE on both GeoNet and ImageNet. Evaluating these models on six downstream tasks in the few-shot setting reveals that SSL pre-training on RS data offers modest performance improvements over INP, and that it remains competitive in multiple scenarios. This indicates that the presumed benefits of SSL pre-training on RS data may be overstated, and the additional costs of data curation and pre-training could be unjustified.

Paper Structure

This paper contains 27 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: GeoNet images distribution. Colors represent the number of images per 22 km$^2$ area
  • Figure 2: Example imagery from GeoNet using each of the sampling strategies
  • Figure 3: Diagram representing our experimental setup. We first train SwAV and MAE on GeoNet, then evaluate their downstream performance on several tasks.
  • Figure 4: Percent increase when using RSP (GeoNet) vs INP for SwAV and MAE when linear probing on our six benchmark datasets
  • Figure 5: Difference in reconstruction error and in downstream performance are weakly positively correlated. We plot the difference between reconstruction error between MAE-GN and MAE-IN and the accuracy difference between MAE-IN and MAE-GN for linear probed models using 1024 training samples. The dotted regression line describe the correlation, and the 95% confidence intervals is shaded. The Pearson’s correlation coefficient is shown in the legend.
  • ...and 1 more figures