Table of Contents
Fetching ...

S5: Scalable Semi-Supervised Semantic Segmentation in Remote Sensing

Liang Lv, Di Wang, Jing Zhang, Lefei Zhang

TL;DR

S5 presents a scalable semi-supervised framework for remote-sensing semantic segmentation that leverages a large unlabeled RS corpus (RS4P-1M) and a learned data-curation strategy to enable S4 pre-training (S4P) on RS foundation models. It further introduces MoE-based multi-dataset fine-tuning (MoE-MDF) to efficiently adapt models across multiple RS benchmarks with minimal parameter overhead. Across segmentation and object-detection tasks, S5 achieves state-of-the-art results and demonstrates strong scalability as model size and unlabeled data increase. The work provides a practical path to high-performance RS foundation models and releases datasets, code, and models for community use.

Abstract

Semi-supervised semantic segmentation (S4) has advanced remote sensing (RS) analysis by leveraging unlabeled data through pseudo-labeling and consistency learning. However, existing S4 studies often rely on small-scale datasets and models, limiting their practical applicability. To address this, we propose S5, the first scalable framework for semi-supervised semantic segmentation in RS, which unlocks the potential of vast unlabeled Earth observation data typically underutilized due to costly pixel-level annotations. Built upon existing large-scale RS datasets, S5 introduces a data selection strategy that integrates entropy-based filtering and diversity expansion, resulting in the RS4P-1M dataset. Using this dataset, we systematically scale up S4 into a new pretraining paradigm, S4 pre-training (S4P), to pretrain RS foundation models (RSFMs) of varying sizes on this extensive corpus, significantly boosting their performance on land cover segmentation and object detection tasks. Furthermore, during fine-tuning, we incorporate a Mixture-of-Experts (MoE)-based multi-dataset fine-tuning approach, which enables efficient adaptation to multiple RS benchmarks with fewer parameters. This approach improves the generalization and versatility of RSFMs across diverse RS benchmarks. The resulting RSFMs achieve state-of-the-art performance across all benchmarks, underscoring the viability of scaling semi-supervised learning for RS applications. All datasets, code, and models will be released at https://github.com/MiliLab/S5

S5: Scalable Semi-Supervised Semantic Segmentation in Remote Sensing

TL;DR

S5 presents a scalable semi-supervised framework for remote-sensing semantic segmentation that leverages a large unlabeled RS corpus (RS4P-1M) and a learned data-curation strategy to enable S4 pre-training (S4P) on RS foundation models. It further introduces MoE-based multi-dataset fine-tuning (MoE-MDF) to efficiently adapt models across multiple RS benchmarks with minimal parameter overhead. Across segmentation and object-detection tasks, S5 achieves state-of-the-art results and demonstrates strong scalability as model size and unlabeled data increase. The work provides a practical path to high-performance RS foundation models and releases datasets, code, and models for community use.

Abstract

Semi-supervised semantic segmentation (S4) has advanced remote sensing (RS) analysis by leveraging unlabeled data through pseudo-labeling and consistency learning. However, existing S4 studies often rely on small-scale datasets and models, limiting their practical applicability. To address this, we propose S5, the first scalable framework for semi-supervised semantic segmentation in RS, which unlocks the potential of vast unlabeled Earth observation data typically underutilized due to costly pixel-level annotations. Built upon existing large-scale RS datasets, S5 introduces a data selection strategy that integrates entropy-based filtering and diversity expansion, resulting in the RS4P-1M dataset. Using this dataset, we systematically scale up S4 into a new pretraining paradigm, S4 pre-training (S4P), to pretrain RS foundation models (RSFMs) of varying sizes on this extensive corpus, significantly boosting their performance on land cover segmentation and object detection tasks. Furthermore, during fine-tuning, we incorporate a Mixture-of-Experts (MoE)-based multi-dataset fine-tuning approach, which enables efficient adaptation to multiple RS benchmarks with fewer parameters. This approach improves the generalization and versatility of RSFMs across diverse RS benchmarks. The resulting RSFMs achieve state-of-the-art performance across all benchmarks, underscoring the viability of scaling semi-supervised learning for RS applications. All datasets, code, and models will be released at https://github.com/MiliLab/S5

Paper Structure

This paper contains 35 sections, 10 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: (a) Traditional S4 workflow: splitting the dataset into labeled and unlabeled subsets to improve model performance with few labeled samples. (b) The proposed S5 workflow: perform semi-supervised segmentation pretraining on both labeled and large-scale unlabeled datasets, followed by fine-tuning on RS benchmarks. (c) Comparison of performance across four RS segmentation and two object detection benchmarks.
  • Figure 2: The overall pipeline of the proposed S5 framework. It starts with the construction of the RS4P-1M dataset, followed by training RSFMs based on the S4P. The pre-trained weights are then fine-tuned on semantic segmentation and object detection benchmarks through the MoE-based multiple dataset fine-tuning (MoE-MDF) scheme. ViT-MoE indicates the integration of the FFN-MoE modules into the ViT backbones.
  • Figure 3: The workflow of MoE-MDF. ViT-MoE refers to the incorporation of the FFN-MoE module into the standard ViT. The black solid arrows and dashed arrows represent the forward propagation paths for different datasets, while the red arrows indicate the shared forward propagation.
  • Figure 4: Fine-tuning results on three RS benchmarks with varying pre-training dataset sizes and backbones. “100K” and “1M” indicate the number of images used for S4P.
  • Figure 5: Visualization of RS4P-1M samples with generated pseudo-labels.
  • ...and 2 more figures