Table of Contents
Fetching ...

CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation

Ziqi Ye, Ziyang Gong, Ning Liao, Xiaoxing Hu, Di Wang, Hongruixuan Chen, Chen Huang, Yiguo He, Yuru Jia, Xiaoxing Wang, Haipeng Wang, Xue Yang, Junchi Yan

Abstract

Synthetic Aperture Radar (SAR) enables global, all-weather earth observation. However, owing to diverse imaging mechanisms, domain shifts across sensors and regions severely hinder its semantic generalization. To address this, we present CrossEarth-SAR, the first billion-scale SAR vision foundation model built upon a novel physics-guided sparse mixture-of-experts (MoE) architecture incorporating physical descriptors, explicitly designed for cross-domain semantic segmentation. To facilitate large-scale pre-training, we develop CrossEarth-SAR-200K, a weakly and fully supervised dataset that unifies public and private SAR imagery. We also introduce a benchmark suite comprising 22 sub-benchmarks across 8 distinct domain gaps, establishing the first unified standard for domain generalization semantic segmentation on SAR imagery. Extensive experiments demonstrate that CrossEarth-SAR achieves state-of-the-art results on 20 benchmarks, surpassing previous methods by over 10\% mIoU on some benchmarks under multi-gap transfer. All code, benchmark and datasets will be publicly available.

CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation

Abstract

Synthetic Aperture Radar (SAR) enables global, all-weather earth observation. However, owing to diverse imaging mechanisms, domain shifts across sensors and regions severely hinder its semantic generalization. To address this, we present CrossEarth-SAR, the first billion-scale SAR vision foundation model built upon a novel physics-guided sparse mixture-of-experts (MoE) architecture incorporating physical descriptors, explicitly designed for cross-domain semantic segmentation. To facilitate large-scale pre-training, we develop CrossEarth-SAR-200K, a weakly and fully supervised dataset that unifies public and private SAR imagery. We also introduce a benchmark suite comprising 22 sub-benchmarks across 8 distinct domain gaps, establishing the first unified standard for domain generalization semantic segmentation on SAR imagery. Extensive experiments demonstrate that CrossEarth-SAR achieves state-of-the-art results on 20 benchmarks, surpassing previous methods by over 10\% mIoU on some benchmarks under multi-gap transfer. All code, benchmark and datasets will be publicly available.
Paper Structure (22 sections, 15 equations, 15 figures, 12 tables)

This paper contains 22 sections, 15 equations, 15 figures, 12 tables.

Figures (15)

  • Figure 1: We evaluate representative models on 22 valuation benchmarks, where CrossEarth-SAR achieves SOTA performance (mIoU) on 20 settings across various segmentation scenes, demonstrating strong generalizability.
  • Figure 2: Geographic distribution of the CrossEarth-SAR-200K, demonstrating its comprehensive coverage across hundreds of cities on six continents.
  • Figure 3: Framework of CrossEarth-SAR. (a) Collect large-scale labeled SAR segmentation data to construct CrossEarth-SAR-200K. (b) The continued pre-training structure of CrossEarth-SAR. (c) SAR RS-PEFT by Earth-Adapter. (d) The information of diverse benchmarks.
  • Figure 4: Visualizations of predicted segmentation maps on six representative benchmarks. For the top three rows color map, blue is water, green is vegetation, red is ground, cyan is road, yellow is building, and purple is mountain. For the bottom two rows color map, blue is farmland, green is greenery, red is road, cyan is building, yellow is water, and white is background.
  • Figure 5: Relationship Between Expert Activations and the SAR Physical Domain.
  • ...and 10 more figures