SAR-W-MixMAE: SAR Foundation Model Training Using Backscatter Power Weighting
Ali Caglayan, Nevrez Imamoglu, Toru Kouyama
TL;DR
This paper tackles the challenge of applying foundation-model pretraining to SAR imagery by introducing SAR-W-MixMAE, a MixMAE-based pretraining framework that incorporates SAR-specific priors through pixel-wise backscatter power weighting. The authors derive a weighting scheme from the linear-scale average of the VH and VV channels, $W_{SAR}$, and modify the reconstruction loss to emphasize lower-backscatter regions, improving robustness to speckle noise. Pretraining on BigEarthNet with Sentinel-1 data, followed by finetuning for multi-label SAR classification and flood detection on SEN12-FLOOD, yields notable improvements over random initialization and baseline MixMAE, especially in flood detection where recall increases significantly. These results suggest that embedding domain-specific priors into self-supervised SAR pretraining can produce more transferable representations for earth observation tasks under challenging noise conditions.
Abstract
Foundation model approaches such as masked auto-encoders (MAE) or its variations are now being successfully applied to satellite imagery. Most of the ongoing technical validation of foundation models have been applied to optical images like RGB or multi-spectral images. Due to difficulty in semantic labeling to create datasets and higher noise content with respect to optical images, Synthetic Aperture Radar (SAR) data has not been explored a lot in the field for foundation models. Therefore, in this work as a pre-training approach, we explored masked auto-encoder, specifically MixMAE on Sentinel-1 SAR images and its impact on SAR image classification tasks. Moreover, we proposed to use the physical characteristic of SAR data for applying weighting parameter on the auto-encoder training loss (MSE) to reduce the effect of speckle noise and very high values on the SAR images. Proposed SAR intensity-based weighting of the reconstruction loss demonstrates promising results both on SAR pre-training and downstream tasks specifically on flood detection compared with the baseline model.
