Table of Contents
Fetching ...

S4: Self-Supervised Sensing Across the Spectrum

Jayanth Shenoy, Xingjian Davis Zhang, Shlok Mehrotra, Bill Tao, Rem Yang, Han Zhao, Deepak Vasisht

TL;DR

This work proposes S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies, and (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment.

Abstract

Satellite image time series (SITS) segmentation is crucial for many applications like environmental monitoring, land cover mapping and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine grained annotation. We propose S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies. (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment. We use these insights to formulate pre-training tasks in S4. We also curate m2s2-SITS, a large-scale dataset of unlabeled, spatially-aligned, multi-modal and geographic specific SITS that serves as representative pre-training data for S4. Finally, we evaluate S4 on multiple SITS segmentation datasets and demonstrate its efficacy against competing baselines while using limited labeled data.

S4: Self-Supervised Sensing Across the Spectrum

TL;DR

This work proposes S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies, and (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment.

Abstract

Satellite image time series (SITS) segmentation is crucial for many applications like environmental monitoring, land cover mapping and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine grained annotation. We propose S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies. (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment. We use these insights to formulate pre-training tasks in S4. We also curate m2s2-SITS, a large-scale dataset of unlabeled, spatially-aligned, multi-modal and geographic specific SITS that serves as representative pre-training data for S4. Finally, we evaluate S4 on multiple SITS segmentation datasets and demonstrate its efficacy against competing baselines while using limited labeled data.
Paper Structure (16 sections, 2 equations, 7 figures, 1 table)

This paper contains 16 sections, 2 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Optical images in one SITS captured at different points in time over the same location. The rightmost image is the segmentation mask corresponding to this spatial location. The different images illustrate the significant temporal variation that occurs during crop growth.
  • Figure 2: Overview of S4. S4 takes in temporally pre-processed multi-modal time series data. During pre-training, radar-optical SITS pairs flow through the network and our proposed MMST contrastive loss and Cross-Modal reconstructive loss operate on their encodings. After pre-training, a small amount of labeled data is used to fine-tune the model for SITS segmentation.
  • Figure 3: Multi-modal images captured on the same day: while the optical image (left) is occluded by clouds, the radar image (right) is not affected.
  • Figure 4: Multi-Modal Space-Time Contrastive Learning for SITS. Our approach operates on the encoded SITS feature maps. Corresponding space-time pixels on the feature map are denoted as positive pairs that the contrastive loss tries to align. Non-corresponding pixel pairs are negative and repelled by the loss.
  • Figure 5: Loss and modality ablation for S4.
  • ...and 2 more figures