SAFE: a SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs
Max Muzeau, Joana Frontera-Pons, Chengfang Ren, Jean-Philippe Ovarlez
TL;DR
The paper addresses the scarcity of labeled SAR data by proposing SAFE, a self-supervised framework using masked Siamese Vision Transformers to learn a general SAR feature extractor. SAFE employs a teacher–student ViT duo, prototype-based clustering, and SAR-tailored augmentations (e.g., sub-aperture decomposition, despeckling) to produce robust representations across acquisition modes and resolutions. It demonstrates versatility through segmentation, few-shot classification, feature visualization, and pattern detection on diverse SAR datasets, often matching or surpassing task-specific baselines without being trained on the evaluation data. The approach promises a scalable backbone for SAR applications, enabling reliable cross-sensor analysis and rapid deployment in real-world scenarios.
Abstract
Due to its all-weather and day-and-night capabilities, Synthetic Aperture Radar imagery is essential for various applications such as disaster management, earth monitoring, change detection and target recognition. However, the scarcity of labeled SAR data limits the performance of most deep learning algorithms. To address this issue, we propose a novel self-supervised learning framework based on masked Siamese Vision Transformers to create a General SAR Feature Extractor coined SAFE. Our method leverages contrastive learning principles to train a model on unlabeled SAR data, extracting robust and generalizable features. SAFE is applicable across multiple SAR acquisition modes and resolutions. We introduce tailored data augmentation techniques specific to SAR imagery, such as sub-aperture decomposition and despeckling. Comprehensive evaluations on various downstream tasks, including few-shot classification, segmentation, visualization, and pattern detection, demonstrate the effectiveness and versatility of the proposed approach. Our network competes with or surpasses other state-of-the-art methods in few-shot classification and segmentation tasks, even without being trained on the sensors used for the evaluation.
