Table of Contents
Fetching ...

SAFE: a SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs

Max Muzeau, Joana Frontera-Pons, Chengfang Ren, Jean-Philippe Ovarlez

TL;DR

The paper addresses the scarcity of labeled SAR data by proposing SAFE, a self-supervised framework using masked Siamese Vision Transformers to learn a general SAR feature extractor. SAFE employs a teacher–student ViT duo, prototype-based clustering, and SAR-tailored augmentations (e.g., sub-aperture decomposition, despeckling) to produce robust representations across acquisition modes and resolutions. It demonstrates versatility through segmentation, few-shot classification, feature visualization, and pattern detection on diverse SAR datasets, often matching or surpassing task-specific baselines without being trained on the evaluation data. The approach promises a scalable backbone for SAR applications, enabling reliable cross-sensor analysis and rapid deployment in real-world scenarios.

Abstract

Due to its all-weather and day-and-night capabilities, Synthetic Aperture Radar imagery is essential for various applications such as disaster management, earth monitoring, change detection and target recognition. However, the scarcity of labeled SAR data limits the performance of most deep learning algorithms. To address this issue, we propose a novel self-supervised learning framework based on masked Siamese Vision Transformers to create a General SAR Feature Extractor coined SAFE. Our method leverages contrastive learning principles to train a model on unlabeled SAR data, extracting robust and generalizable features. SAFE is applicable across multiple SAR acquisition modes and resolutions. We introduce tailored data augmentation techniques specific to SAR imagery, such as sub-aperture decomposition and despeckling. Comprehensive evaluations on various downstream tasks, including few-shot classification, segmentation, visualization, and pattern detection, demonstrate the effectiveness and versatility of the proposed approach. Our network competes with or surpasses other state-of-the-art methods in few-shot classification and segmentation tasks, even without being trained on the sensors used for the evaluation.

SAFE: a SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs

TL;DR

The paper addresses the scarcity of labeled SAR data by proposing SAFE, a self-supervised framework using masked Siamese Vision Transformers to learn a general SAR feature extractor. SAFE employs a teacher–student ViT duo, prototype-based clustering, and SAR-tailored augmentations (e.g., sub-aperture decomposition, despeckling) to produce robust representations across acquisition modes and resolutions. It demonstrates versatility through segmentation, few-shot classification, feature visualization, and pattern detection on diverse SAR datasets, often matching or surpassing task-specific baselines without being trained on the evaluation data. The approach promises a scalable backbone for SAR applications, enabling reliable cross-sensor analysis and rapid deployment in real-world scenarios.

Abstract

Due to its all-weather and day-and-night capabilities, Synthetic Aperture Radar imagery is essential for various applications such as disaster management, earth monitoring, change detection and target recognition. However, the scarcity of labeled SAR data limits the performance of most deep learning algorithms. To address this issue, we propose a novel self-supervised learning framework based on masked Siamese Vision Transformers to create a General SAR Feature Extractor coined SAFE. Our method leverages contrastive learning principles to train a model on unlabeled SAR data, extracting robust and generalizable features. SAFE is applicable across multiple SAR acquisition modes and resolutions. We introduce tailored data augmentation techniques specific to SAR imagery, such as sub-aperture decomposition and despeckling. Comprehensive evaluations on various downstream tasks, including few-shot classification, segmentation, visualization, and pattern detection, demonstrate the effectiveness and versatility of the proposed approach. Our network competes with or surpasses other state-of-the-art methods in few-shot classification and segmentation tasks, even without being trained on the sensors used for the evaluation.
Paper Structure (16 sections, 12 equations, 11 figures, 4 tables)

This paper contains 16 sections, 12 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Principle of SSL with Siamese networks. Notations are the one explained in \ref{['sec:concepts']}
  • Figure 2: Principle of our SAR Feature Extractor (SAFE). One network can extract meaningful features from different SAR acquisition methods. These features can then be used to do many downstream tasks.
  • Figure 3: Model architecture for the training phase. The notations are detailed in \ref{['subsec:model']}.
  • Figure 4: Method for the sub-aperture augmentation of SAR images
  • Figure 5: SLC image (left) and denoised SAR image (right) with MERLIN dalsasso2021if
  • ...and 6 more figures