SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

Peihao Wu; Yongxiang Yao; Yi Wan; Wenfei Zhang; Ruipeng Zhao; Jiayuan Li; Yongjun Zhang

SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

Peihao Wu, Yongxiang Yao, Yi Wan, Wenfei Zhang, Ruipeng Zhao, Jiayuan Li, Yongjun Zhang

TL;DR

SOMA-1M tackles the shortage of large-scale, high-precision, multi-resolution SAR–optical data by delivering over $1.3$ million pixel-aligned image pairs across $0.5$ m, $3$ m, and $10$ m resolutions. A rigorous coarse-to-fine registration framework ensures pixel-level alignment and preserves geolocation metadata, while four benchmarks (image matching, image fusion, SAR-assisted cloud removal, and SAR-to-optical translation) demonstrate the dataset's value across tasks and resolutions. Training on a $0.1$M SOMA-0.1M subset consistently improves state-of-the-art performance across baselines and tasks, with particularly strong gains in multimodal matching and translation when data are explicitly aligned. The findings reveal resolution-dependent strengths and weaknesses, supporting a multi-resolution hierarchical design to advance cross-modal remote sensing and foundation-model development that can operate globally with spatial awareness.

Abstract

Synthetic Aperture Radar (SAR) and optical imagery provide complementary strengths that constitute the critical foundation for transcending single-modality constraints and facilitating cross-modal collaborative processing and intelligent interpretation. However, existing benchmark datasets often suffer from limitations such as single spatial resolution, insufficient data scale, and low alignment accuracy, making them inadequate for supporting the training and generalization of multi-scale foundation models. To address these challenges, we introduce SOMA-1M (SAR-Optical Multi-resolution Alignment), a pixel-level precisely aligned dataset containing over 1.3 million pairs of georeferenced images with a specification of 512 x 512 pixels. This dataset integrates imagery from Sentinel-1, PIESAT-1, Capella Space, and Google Earth, achieving global multi-scale coverage from 0.5 m to 10 m. It encompasses 12 typical land cover categories, effectively ensuring scene diversity and complexity. To address multimodal projection deformation and massive data registration, we designed a rigorous coarse-to-fine image matching framework ensuring pixel-level alignment. Based on this dataset, we established comprehensive evaluation benchmarks for four hierarchical vision tasks, including image matching, image fusion, SAR-assisted cloud removal, and cross-modal translation, involving over 30 mainstream algorithms. Experimental results demonstrate that supervised training on SOMA-1M significantly enhances performance across all tasks. Notably, multimodal remote sensing image (MRSI) matching performance achieves current state-of-the-art (SOTA) levels. SOMA-1M serves as a foundational resource for robust multimodal algorithms and remote sensing foundation models. The dataset will be released publicly at: https://github.com/PeihaoWu/SOMA-1M.

SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

TL;DR

SOMA-1M tackles the shortage of large-scale, high-precision, multi-resolution SAR–optical data by delivering over

million pixel-aligned image pairs across

m, and

m resolutions. A rigorous coarse-to-fine registration framework ensures pixel-level alignment and preserves geolocation metadata, while four benchmarks (image matching, image fusion, SAR-assisted cloud removal, and SAR-to-optical translation) demonstrate the dataset's value across tasks and resolutions. Training on a

M SOMA-0.1M subset consistently improves state-of-the-art performance across baselines and tasks, with particularly strong gains in multimodal matching and translation when data are explicitly aligned. The findings reveal resolution-dependent strengths and weaknesses, supporting a multi-resolution hierarchical design to advance cross-modal remote sensing and foundation-model development that can operate globally with spatial awareness.

Abstract

Paper Structure (40 sections, 1 equation, 12 figures, 7 tables)

This paper contains 40 sections, 1 equation, 12 figures, 7 tables.

Introduction
Related work
Multimodal Remote Sensing Datasets
Low-to-Mid Resolution Datasets
High-Resolution Datasets
Emerging Multi-Res and Foundation Datasets
Summary and Positioning of SOMA-1M
Multi-Modal Applications
Image Matching
Image Fusion
SAR-Assisted Cloud Removal
SAR-to-Optical Translation
SOMA-1M Dataset Construction
Data Collection
Automated Data Annotation
...and 25 more sections

Figures (12)

Figure 1: Overview of the SOMA-1M dataset and examples of its multi-task applications. The two leftmost columns display the original SAR and optical input images. The remaining columns illustrate representative results generated by models trained on this dataset: (a) Image Matching; (b) Image Fusion; (c) SAR-Assisted Cloud Removal; and (d) SAR-to-Optical Translation.
Figure 2: Global geographic distribution of SOMA-1M sampling points.
Figure 3: Flowchart of the automated data annotation pipeline.
Figure 4: Visualization of alignment results.
Figure 5: Visualization examples of 12 typical land-cover categories in the SOMA-1M dataset. Each group presents a pair of SAR and optical images with strict pixel-level alignment.
...and 7 more figures

SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

TL;DR

Abstract

SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing

Authors

TL;DR

Abstract

Table of Contents

Figures (12)