SARMAE: Masked Autoencoder for SAR Representation Learning

Danxu Liu; Di Wang; Hebaixu Wang; Haoyang Chen; Wentao Jiang; Yilin Cheng; Haonan Guo; Wei Cui; Jing Zhang

SARMAE: Masked Autoencoder for SAR Representation Learning

Danxu Liu, Di Wang, Hebaixu Wang, Haoyang Chen, Wentao Jiang, Yilin Cheng, Haonan Guo, Wei Cui, Jing Zhang

TL;DR

The paper tackles the scarcity of large-scale labeled SAR data and the challenge of speckle noise in SAR imagery by introducing SARMAE, a noise-aware masked autoencoder framework for self-supervised SAR representation learning. It builds SAR-1M, the first million-scale SAR dataset with paired optical images, and integrates two innovations: Speckle-Aware Representation Enhancement (SARE) for denoising-based reconstruction under physically grounded speckle models, and Semantic Anchor Representation Constraint (SARC) to align SAR features with semantic priors from paired optical imagery. Through extensive experiments on classification, detection, and segmentation, SARMAE achieves state-of-the-art results and demonstrates strong generalization, driven by in-domain SAR pretraining and cross-modal guidance. The work establishes a foundation for scalable SAR-oriented foundation models and provides a reusable dataset and methodology for the community.

Abstract

Synthetic Aperture Radar (SAR) imagery plays a critical role in all-weather, day-and-night remote sensing applications. However, existing SAR-oriented deep learning is constrained by data scarcity, while the physically grounded speckle noise in SAR imagery further hampers fine-grained semantic representation learning. To address these challenges, we propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning. Specifically, we construct SAR-1M, the first million-scale SAR dataset, with additional paired optical images, to enable large-scale pre-training. Building upon this, we design Speckle-Aware Representation Enhancement (SARE), which injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning. Furthermore, we introduce Semantic Anchor Representation Constraint (SARC), which leverages paired optical priors to align SAR features and ensure semantic consistency. Extensive experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks. Code and models will be available at https://github.com/MiliLab/SARMAE.

SARMAE: Masked Autoencoder for SAR Representation Learning

TL;DR

Abstract

SARMAE: Masked Autoencoder for SAR Representation Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)