Table of Contents
Fetching ...

SARMAE: Masked Autoencoder for SAR Representation Learning

Danxu Liu, Di Wang, Hebaixu Wang, Haoyang Chen, Wentao Jiang, Yilin Cheng, Haonan Guo, Wei Cui, Jing Zhang

TL;DR

The paper tackles the scarcity of large-scale labeled SAR data and the challenge of speckle noise in SAR imagery by introducing SARMAE, a noise-aware masked autoencoder framework for self-supervised SAR representation learning. It builds SAR-1M, the first million-scale SAR dataset with paired optical images, and integrates two innovations: Speckle-Aware Representation Enhancement (SARE) for denoising-based reconstruction under physically grounded speckle models, and Semantic Anchor Representation Constraint (SARC) to align SAR features with semantic priors from paired optical imagery. Through extensive experiments on classification, detection, and segmentation, SARMAE achieves state-of-the-art results and demonstrates strong generalization, driven by in-domain SAR pretraining and cross-modal guidance. The work establishes a foundation for scalable SAR-oriented foundation models and provides a reusable dataset and methodology for the community.

Abstract

Synthetic Aperture Radar (SAR) imagery plays a critical role in all-weather, day-and-night remote sensing applications. However, existing SAR-oriented deep learning is constrained by data scarcity, while the physically grounded speckle noise in SAR imagery further hampers fine-grained semantic representation learning. To address these challenges, we propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning. Specifically, we construct SAR-1M, the first million-scale SAR dataset, with additional paired optical images, to enable large-scale pre-training. Building upon this, we design Speckle-Aware Representation Enhancement (SARE), which injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning. Furthermore, we introduce Semantic Anchor Representation Constraint (SARC), which leverages paired optical priors to align SAR features and ensure semantic consistency. Extensive experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks. Code and models will be available at https://github.com/MiliLab/SARMAE.

SARMAE: Masked Autoencoder for SAR Representation Learning

TL;DR

The paper tackles the scarcity of large-scale labeled SAR data and the challenge of speckle noise in SAR imagery by introducing SARMAE, a noise-aware masked autoencoder framework for self-supervised SAR representation learning. It builds SAR-1M, the first million-scale SAR dataset with paired optical images, and integrates two innovations: Speckle-Aware Representation Enhancement (SARE) for denoising-based reconstruction under physically grounded speckle models, and Semantic Anchor Representation Constraint (SARC) to align SAR features with semantic priors from paired optical imagery. Through extensive experiments on classification, detection, and segmentation, SARMAE achieves state-of-the-art results and demonstrates strong generalization, driven by in-domain SAR pretraining and cross-modal guidance. The work establishes a foundation for scalable SAR-oriented foundation models and provides a reusable dataset and methodology for the community.

Abstract

Synthetic Aperture Radar (SAR) imagery plays a critical role in all-weather, day-and-night remote sensing applications. However, existing SAR-oriented deep learning is constrained by data scarcity, while the physically grounded speckle noise in SAR imagery further hampers fine-grained semantic representation learning. To address these challenges, we propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning. Specifically, we construct SAR-1M, the first million-scale SAR dataset, with additional paired optical images, to enable large-scale pre-training. Building upon this, we design Speckle-Aware Representation Enhancement (SARE), which injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning. Furthermore, we introduce Semantic Anchor Representation Constraint (SARC), which leverages paired optical priors to align SAR features and ensure semantic consistency. Extensive experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks. Code and models will be available at https://github.com/MiliLab/SARMAE.

Paper Structure

This paper contains 31 sections, 12 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: SARMAE outperforms SOTA methods on multiple datasets. $^1$: 40-SHOT; $^2$: 30% labeled. $^a$: Multi-classes; $^b$: Water.
  • Figure 2: The organization of data sources in SAR-1M.
  • Figure 3: Overview of the SARMAE pretraining framework. The framework consists of two branches: (i) a SAR branch following the MAE architecture with Speckle-Aware Representation Enhancement (SARE) to handle inherent speckle noise, and (ii) an optical branch using a frozen DINOv3 encoder. For paired SAR-optical data, Semantic Anchor Representation Constraint (SARC) aligns SAR features with semantic-rich optical representations. Unpaired SAR images are processed solely through the SAR branch.
  • Figure 4: SARE significantly enhances the model’s semantic perception. Attention maps are computed by measuring the attention between the MAE encoder’s final-layer class token and image patch tokens.
  • Figure 5: SARC leverages paired optical priors to recover the structural details of the original SAR image, which the model fails to reconstruct without this module.
  • ...and 7 more figures