WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

Yannik Glaser; Justin E. Stopa; Linnea M. Wolniewicz; Ralph Foster; Doug Vandemark; Alexis Mouche; Bertrand Chapron; Peter Sadowski

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

Yannik Glaser, Justin E. Stopa, Linnea M. Wolniewicz, Ralph Foster, Doug Vandemark, Alexis Mouche, Bertrand Chapron, Peter Sadowski

TL;DR

It is demonstrated that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks and scale better in data-sparse settings.

Abstract

The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range of applications, but the need for manual image annotations is a bottleneck that hinders the use of machine learning methods. This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images (ImageNet) with supervised learning. Experiments show improvements for estimating wave height (0.50 vs 0.60 RMSE using linear probing), estimating near-surface air temperature (0.90 vs 0.97 RMSE), and performing multilabel-classification of geophysical and atmospheric phenomena (0.96 vs 0.95 micro-averaged AUROC). WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings. Together, these results demonstrate that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks.

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

TL;DR

Abstract

Paper Structure (32 sections, 5 equations, 9 figures, 4 tables)

This paper contains 32 sections, 5 equations, 9 figures, 4 tables.

Introduction
Methods
Datasets
GOALI classification dataset
Wave height regression dataset
Air temperature regression dataset
Implementation details
Augmentations
Evaluation protocols
Multilabel classification
Regression
Image retrieval
Results
Optimization of WV-mode specific data augmentations
Transfer learning
...and 17 more sections

Figures (9)

Figure 1: (a--d): Sample images of different geophysical phenomena observable in the global S-1 WV archive, titled by their dominant classes. Multiple classes can be present in the same image. (e--h): Augmented versions of the low wind image illustrating the default SimCLR augmentation policies. (i--l): Augmented low wind images illustrating the augmentation policies evaluated in this work. In the actual SimCLR framework, usually multiple augmentation are applied in sequence to the same image.
Figure 2: In the SimCLR algorithm, images are randomly augmented to create several views of the same image. An encoder network --- consisting of a backbone and a smaller projection head --- learns to produce an embedding that is similar to embedded views from the same original image and dissimilar to embedded views from all other images. Only the encoder backbone is used for transfer learning.
Figure 3: Performance of various embeddings (Micro-AUROC, higher is better) vs. number of labeled training samples in the multilabel classification task. This experiment used the MLP transfer learning protocol.
Figure 4: Image retrieval example for atmospheric gravity wave class. Anchor image (left column) is the query for kNN retrieval and the six images to the right are top-3 neighbors from ImageNet and WV-Net embeddings. This example shows successful image retrieval with the class present in the lower half of the anchor image.
Figure 5: Twelve representative examples of the geophysical phenomena observed in the global Sentinel-1 WV archive. The panels are defined as negligible atmospheric variability (NV) (a), wind streaks (WS) (b), a mixture of WS and micro-scale cells (MC) (c), MC (d), rain cells (RC) also notice the two circular patterns in the bottom of the image that represent cold pools (CP) (e), sub-mesoscale air-mass boundary (AB) also contains WS/MC on the left-hand side (f), a low wind area (LW) containing biological slicks (the black lines) (BS) and MC, the circular structure of the BS are likely due to a small-scale eddy (g), atmospheric gravity waves (AW) (h), unidentified ocean or atmosphere (UD) (i), ocean front (OF) along with MC and BS (j), internal oceanic wave (k), and sea ice with icebergs (l).
...and 4 more figures

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

TL;DR

Abstract

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

Authors

TL;DR

Abstract

Table of Contents

Figures (9)