Table of Contents
Fetching ...

SAR-W-MixMAE: SAR Foundation Model Training Using Backscatter Power Weighting

Ali Caglayan, Nevrez Imamoglu, Toru Kouyama

TL;DR

This paper tackles the challenge of applying foundation-model pretraining to SAR imagery by introducing SAR-W-MixMAE, a MixMAE-based pretraining framework that incorporates SAR-specific priors through pixel-wise backscatter power weighting. The authors derive a weighting scheme from the linear-scale average of the VH and VV channels, $W_{SAR}$, and modify the reconstruction loss to emphasize lower-backscatter regions, improving robustness to speckle noise. Pretraining on BigEarthNet with Sentinel-1 data, followed by finetuning for multi-label SAR classification and flood detection on SEN12-FLOOD, yields notable improvements over random initialization and baseline MixMAE, especially in flood detection where recall increases significantly. These results suggest that embedding domain-specific priors into self-supervised SAR pretraining can produce more transferable representations for earth observation tasks under challenging noise conditions.

Abstract

Foundation model approaches such as masked auto-encoders (MAE) or its variations are now being successfully applied to satellite imagery. Most of the ongoing technical validation of foundation models have been applied to optical images like RGB or multi-spectral images. Due to difficulty in semantic labeling to create datasets and higher noise content with respect to optical images, Synthetic Aperture Radar (SAR) data has not been explored a lot in the field for foundation models. Therefore, in this work as a pre-training approach, we explored masked auto-encoder, specifically MixMAE on Sentinel-1 SAR images and its impact on SAR image classification tasks. Moreover, we proposed to use the physical characteristic of SAR data for applying weighting parameter on the auto-encoder training loss (MSE) to reduce the effect of speckle noise and very high values on the SAR images. Proposed SAR intensity-based weighting of the reconstruction loss demonstrates promising results both on SAR pre-training and downstream tasks specifically on flood detection compared with the baseline model.

SAR-W-MixMAE: SAR Foundation Model Training Using Backscatter Power Weighting

TL;DR

This paper tackles the challenge of applying foundation-model pretraining to SAR imagery by introducing SAR-W-MixMAE, a MixMAE-based pretraining framework that incorporates SAR-specific priors through pixel-wise backscatter power weighting. The authors derive a weighting scheme from the linear-scale average of the VH and VV channels, , and modify the reconstruction loss to emphasize lower-backscatter regions, improving robustness to speckle noise. Pretraining on BigEarthNet with Sentinel-1 data, followed by finetuning for multi-label SAR classification and flood detection on SEN12-FLOOD, yields notable improvements over random initialization and baseline MixMAE, especially in flood detection where recall increases significantly. These results suggest that embedding domain-specific priors into self-supervised SAR pretraining can produce more transferable representations for earth observation tasks under challenging noise conditions.

Abstract

Foundation model approaches such as masked auto-encoders (MAE) or its variations are now being successfully applied to satellite imagery. Most of the ongoing technical validation of foundation models have been applied to optical images like RGB or multi-spectral images. Due to difficulty in semantic labeling to create datasets and higher noise content with respect to optical images, Synthetic Aperture Radar (SAR) data has not been explored a lot in the field for foundation models. Therefore, in this work as a pre-training approach, we explored masked auto-encoder, specifically MixMAE on Sentinel-1 SAR images and its impact on SAR image classification tasks. Moreover, we proposed to use the physical characteristic of SAR data for applying weighting parameter on the auto-encoder training loss (MSE) to reduce the effect of speckle noise and very high values on the SAR images. Proposed SAR intensity-based weighting of the reconstruction loss demonstrates promising results both on SAR pre-training and downstream tasks specifically on flood detection compared with the baseline model.

Paper Structure

This paper contains 10 sections, 3 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Overview of the proposed SAR-W-MixMAE model. Sentinel-1 SAR data with VH and VV channels is processed through a mixing module to combine patches from two inputs, followed by a hierarchical encoder leveraging Swin Transformer blocks and patch merging. SAR-specific pixel-wise weights are incorporated into the reconstruction loss during the decoding phase to enhance robustness against SAR noise. The pretrained model is further fine-tuned for downstream tasks such as multi-label classification and flood detection.