Table of Contents
Fetching ...

Evaluating the Label Efficiency of Contrastive Self-Supervised Learning for Multi-Resolution Satellite Imagery

Jules Bourcier, Gohar Dashyan, Jocelyn Chanussot, Karteek Alahari

TL;DR

This paper benchmarks two contrastive self-supervised methods adapted from Momentum Contrast and provides evidence that these methods can be perform effectively given little downstream supervision, and outperform out-of-domain pretraining alternatives.

Abstract

The application of deep neural networks to remote sensing imagery is often constrained by the lack of ground-truth annotations. Adressing this issue requires models that generalize efficiently from limited amounts of labeled data, allowing us to tackle a wider range of Earth observation tasks. Another challenge in this domain is developing algorithms that operate at variable spatial resolutions, e.g., for the problem of classifying land use at different scales. Recently, self-supervised learning has been applied in the remote sensing domain to exploit readily-available unlabeled data, and was shown to reduce or even close the gap with supervised learning. In this paper, we study self-supervised visual representation learning through the lens of label efficiency, for the task of land use classification on multi-resolution/multi-scale satellite images. We benchmark two contrastive self-supervised methods adapted from Momentum Contrast (MoCo) and provide evidence that these methods can be perform effectively given little downstream supervision, where randomly initialized networks fail to generalize. Moreover, they outperform out-of-domain pretraining alternatives. We use the large-scale fMoW dataset to pretrain and evaluate the networks, and validate our observations with transfer to the RESISC45 dataset.

Evaluating the Label Efficiency of Contrastive Self-Supervised Learning for Multi-Resolution Satellite Imagery

TL;DR

This paper benchmarks two contrastive self-supervised methods adapted from Momentum Contrast and provides evidence that these methods can be perform effectively given little downstream supervision, and outperform out-of-domain pretraining alternatives.

Abstract

The application of deep neural networks to remote sensing imagery is often constrained by the lack of ground-truth annotations. Adressing this issue requires models that generalize efficiently from limited amounts of labeled data, allowing us to tackle a wider range of Earth observation tasks. Another challenge in this domain is developing algorithms that operate at variable spatial resolutions, e.g., for the problem of classifying land use at different scales. Recently, self-supervised learning has been applied in the remote sensing domain to exploit readily-available unlabeled data, and was shown to reduce or even close the gap with supervised learning. In this paper, we study self-supervised visual representation learning through the lens of label efficiency, for the task of land use classification on multi-resolution/multi-scale satellite images. We benchmark two contrastive self-supervised methods adapted from Momentum Contrast (MoCo) and provide evidence that these methods can be perform effectively given little downstream supervision, where randomly initialized networks fail to generalize. Moreover, they outperform out-of-domain pretraining alternatives. We use the large-scale fMoW dataset to pretrain and evaluate the networks, and validate our observations with transfer to the RESISC45 dataset.
Paper Structure (28 sections, 1 equation, 2 figures, 3 tables)

This paper contains 28 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Label-efficient land use classification on fMoW. These values correspond to Tab. \ref{['tab:results-fmow-clf-all']} in the main text, see the table caption for description.
  • Figure 2: Label-efficient land use classification on RESISC45. These values correspond to Tab. \ref{['tab:results-resisc45-clf-all']} in the main text, see the table caption for description.