Table of Contents
Fetching ...

Curriculum Multi-Task Self-Supervision Improves Lightweight Architectures for Onboard Satellite Hyperspectral Image Segmentation

Hugo Carlesso, Josiane Mothe, Radu Tudor Ionescu

TL;DR

This work presents CMTSSL, a curriculum-based, multi-task self-supervised framework tailored for lightweight onboard hyperspectral image segmentation. By jointly integrating masked image modeling with decoupled spatial and spectral jigsaw puzzle tasks and organizing training data via a gradient-based curriculum, the encoder learns complementary spectral, spatial, and semantic representations without additional labeled data or increased model complexity. Extensive experiments on four public datasets show consistent improvements for lightweight architectures and set a new state-of-the-art on HYPSO, demonstrating practical impact for resource-constrained satellite platforms. The proposed approach offers a scalable pretraining strategy that enhances generalization and enables efficient, accurate hyperspectral processing in spaceborne environments.

Abstract

Hyperspectral imaging (HSI) captures detailed spectral signatures across hundreds of contiguous bands per pixel, being indispensable for remote sensing applications such as land-cover classification, change detection, and environmental monitoring. Due to the high dimensionality of HSI data and the slow rate of data transfer in satellite-based systems, compact and efficient models are required to support onboard processing and minimize the transmission of redundant or low-value data. To this end, we introduce a novel curriculum multi-task self-supervised learning (CMTSSL) framework designed for lightweight architectures for HSI analysis. CMTSSL integrates masked image modeling with decoupled spatial and spectral jigsaw puzzle solving, guided by a curriculum learning strategy that progressively increases data difficulty during self-supervision. This enables the encoder to jointly capture fine-grained spectral continuity, spatial structure, and global semantic features. Unlike prior dual-task SSL methods, CMTSSL simultaneously addresses spatial and spectral reasoning within a unified and computationally efficient design, being particularly suitable for training lightweight models for onboard satellite deployment. We validate our approach on four public benchmark datasets, demonstrating consistent gains in downstream segmentation tasks, using architectures that are over 16,000x lighter than some state-of-the-art models. These results highlight the potential of CMTSSL in generalizable representation learning with lightweight architectures for real-world HSI applications. Our code is publicly available at https://github.com/hugocarlesso/CMTSSL.

Curriculum Multi-Task Self-Supervision Improves Lightweight Architectures for Onboard Satellite Hyperspectral Image Segmentation

TL;DR

This work presents CMTSSL, a curriculum-based, multi-task self-supervised framework tailored for lightweight onboard hyperspectral image segmentation. By jointly integrating masked image modeling with decoupled spatial and spectral jigsaw puzzle tasks and organizing training data via a gradient-based curriculum, the encoder learns complementary spectral, spatial, and semantic representations without additional labeled data or increased model complexity. Extensive experiments on four public datasets show consistent improvements for lightweight architectures and set a new state-of-the-art on HYPSO, demonstrating practical impact for resource-constrained satellite platforms. The proposed approach offers a scalable pretraining strategy that enhances generalization and enables efficient, accurate hyperspectral processing in spaceborne environments.

Abstract

Hyperspectral imaging (HSI) captures detailed spectral signatures across hundreds of contiguous bands per pixel, being indispensable for remote sensing applications such as land-cover classification, change detection, and environmental monitoring. Due to the high dimensionality of HSI data and the slow rate of data transfer in satellite-based systems, compact and efficient models are required to support onboard processing and minimize the transmission of redundant or low-value data. To this end, we introduce a novel curriculum multi-task self-supervised learning (CMTSSL) framework designed for lightweight architectures for HSI analysis. CMTSSL integrates masked image modeling with decoupled spatial and spectral jigsaw puzzle solving, guided by a curriculum learning strategy that progressively increases data difficulty during self-supervision. This enables the encoder to jointly capture fine-grained spectral continuity, spatial structure, and global semantic features. Unlike prior dual-task SSL methods, CMTSSL simultaneously addresses spatial and spectral reasoning within a unified and computationally efficient design, being particularly suitable for training lightweight models for onboard satellite deployment. We validate our approach on four public benchmark datasets, demonstrating consistent gains in downstream segmentation tasks, using architectures that are over 16,000x lighter than some state-of-the-art models. These results highlight the potential of CMTSSL in generalizable representation learning with lightweight architectures for real-world HSI applications. Our code is publicly available at https://github.com/hugocarlesso/CMTSSL.

Paper Structure

This paper contains 30 sections, 3 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Average accuracy vs. number of floating point operations (FLOPs) for various HSI architectures, including both lightweight models (1D Justo justo_semantic_2025, 2D Justo justo_semantic_2025, CLOLN li2024channel, CUNet++ Reduced justo_semantic_2025) and foundation models (HyperSIGMA-B wang_hypersigma_2025). Circle areas indicate parameter counts. Performance is reported on the Pavia University dataset noauthor_hyperspectral_nodate. Our CMTSSL boosts performance of lightweight models, without affecting model size and FLOPs.
  • Figure 2: Curriculum multi-task self-supervised learning (CMTSSL) pipeline. Input HSI cubes are sorted according to their 3D gradient magnitudes and divided into $S$ curriculum batches. To generate the pretext tasks, each input cube undergoes three parallel transformations: random spatial permutation, spectral permutation, and patch masking. The transformed cubes are encoded via a shared encoder and routed to task-specific heads, namely spatial permutation prediction, spectral permutation prediction, and masked image modeling (MIM). The model is jointly optimized via a weighted loss. Best viewed in color.
  • Figure 3: Visual comparison on the Pavia University dataset for the 2D Justo justo_semantic_2025 architecture, using training from scratch vs. CMTSSL. Each color represents a different class, with black being the background. Best viewed in color.
  • Figure 4: Parameter sensitivity analysis on the Pavia Center dataset using the CLOLN architecture. Base values are $\alpha_{\text{spa}}=1$, $\alpha_{\text{spe}}=1$, $\alpha_{\text{mim}}=4$, $K=32$, $F=1.5$, and $S=3$. Each plot varies a single hyperparameter, while the others remain fixed. The baseline corresponds to CLOLN without CMTSSL. Best viewed in color.