Table of Contents
Fetching ...

SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets

Hao Liu, Pengyu Guo, Siyuan Yang, Zeqing Jiang, Qinglei Hu, Dongyu Li

TL;DR

SpaceSeg tackles the problem of high-precision segmentation of multiple on-orbit spacecraft under diverse deep-space conditions by extending a vision foundation model with four innovations: the Multi-Scale Hierarchical Attention Refinement Decoder (MSHARD), Multi-spacecraft Connected Component Analysis (MS-CCA), Spatial Domain Adaptation Transform (SDAT), and a task-specific loss function. By freezing a hierarchical MAE-based encoder, adapting the SAM2 mask decoder via spatial selective dual-stream fine-tuning, and integrating the novel decoding and labeling modules, SpaceSeg achieves a state-of-the-art $mIoU$ of $89.87\%$ and $mAcc$ of $99.98\%$ on the SpaceES dataset. The SpaceES dataset encompasses 1–4 spacecraft, 17 target types, multi-scale scenes, and realistic space imaging disturbances, enabling robust evaluation across multi-spacecraft and multi-background scenarios. The work demonstrates that vision foundation models can be effectively specialized for space perception tasks, with tangible benefits for space situational awareness, debris monitoring, docking, and autonomous servicing in deep-space missions.

Abstract

With the continuous advancement of human exploration into deep space, intelligent perception and high-precision segmentation technology for on-orbit multi-spacecraft targets have become critical factors for ensuring the success of modern space missions. However, the complex deep space environment, diverse imaging conditions, and high variability in spacecraft morphology pose significant challenges to traditional segmentation methods. This paper proposes SpaceSeg, an innovative vision foundation model-based segmentation framework with four core technical innovations: First, the Multi-Scale Hierarchical Attention Refinement Decoder (MSHARD) achieves high-precision feature decoding through cross-resolution feature fusion via hierarchical attention. Second, the Multi-spacecraft Connected Component Analysis (MS-CCA) effectively resolves topological structure confusion in dense targets. Third, the Spatial Domain Adaptation Transform framework (SDAT) eliminates cross-domain disparities and resist spatial sensor perturbations through composite enhancement strategies. Finally, a custom Multi-Spacecraft Segmentation Task Loss Function is created to significantly improve segmentation robustness in deep space scenarios. To support algorithm validation, we construct the first multi-scale on-orbit multi-spacecraft semantic segmentation dataset SpaceES, which covers four types of spatial backgrounds and 17 typical spacecraft targets. In testing, SpaceSeg achieves state-of-the-art performance with 89.87$\%$ mIoU and 99.98$\%$ mAcc, surpassing existing best methods by 5.71 percentage points. The dataset and code are open-sourced at https://github.com/Akibaru/SpaceSeg to provide critical technical support for next-generation space situational awareness systems.

SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets

TL;DR

SpaceSeg tackles the problem of high-precision segmentation of multiple on-orbit spacecraft under diverse deep-space conditions by extending a vision foundation model with four innovations: the Multi-Scale Hierarchical Attention Refinement Decoder (MSHARD), Multi-spacecraft Connected Component Analysis (MS-CCA), Spatial Domain Adaptation Transform (SDAT), and a task-specific loss function. By freezing a hierarchical MAE-based encoder, adapting the SAM2 mask decoder via spatial selective dual-stream fine-tuning, and integrating the novel decoding and labeling modules, SpaceSeg achieves a state-of-the-art of and of on the SpaceES dataset. The SpaceES dataset encompasses 1–4 spacecraft, 17 target types, multi-scale scenes, and realistic space imaging disturbances, enabling robust evaluation across multi-spacecraft and multi-background scenarios. The work demonstrates that vision foundation models can be effectively specialized for space perception tasks, with tangible benefits for space situational awareness, debris monitoring, docking, and autonomous servicing in deep-space missions.

Abstract

With the continuous advancement of human exploration into deep space, intelligent perception and high-precision segmentation technology for on-orbit multi-spacecraft targets have become critical factors for ensuring the success of modern space missions. However, the complex deep space environment, diverse imaging conditions, and high variability in spacecraft morphology pose significant challenges to traditional segmentation methods. This paper proposes SpaceSeg, an innovative vision foundation model-based segmentation framework with four core technical innovations: First, the Multi-Scale Hierarchical Attention Refinement Decoder (MSHARD) achieves high-precision feature decoding through cross-resolution feature fusion via hierarchical attention. Second, the Multi-spacecraft Connected Component Analysis (MS-CCA) effectively resolves topological structure confusion in dense targets. Third, the Spatial Domain Adaptation Transform framework (SDAT) eliminates cross-domain disparities and resist spatial sensor perturbations through composite enhancement strategies. Finally, a custom Multi-Spacecraft Segmentation Task Loss Function is created to significantly improve segmentation robustness in deep space scenarios. To support algorithm validation, we construct the first multi-scale on-orbit multi-spacecraft semantic segmentation dataset SpaceES, which covers four types of spatial backgrounds and 17 typical spacecraft targets. In testing, SpaceSeg achieves state-of-the-art performance with 89.87 mIoU and 99.98 mAcc, surpassing existing best methods by 5.71 percentage points. The dataset and code are open-sourced at https://github.com/Akibaru/SpaceSeg to provide critical technical support for next-generation space situational awareness systems.

Paper Structure

This paper contains 31 sections, 24 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: The target segmentation results of SpaceSeg. A comparison of target segmentation under different scales, spatial imaging backgrounds, and spacecraft categories on the selected dataset.
  • Figure 2: The SpaceSeg architecture. Spatial imaging images are processed through the SDAT Framework for strong robustness and generalization. The results are then passed through the Multi-spacecraft CCA mechanism to determine multi-spacecraft task recognition outcomes, which are subsequently fed into the Image Encoder and then into MSHARD. On the other hand, the Mask, by sampling, becomes the prompt and is fed into MSHARD via the Prompt Encoder. Finally, on-orbit multi-spacecraft object segmentation results are obtained.
  • Figure 3: The MSHARD architecture employs a hybrid fusion strategy for multimodal feature integration. Specifically, image embeddings undergo element-wise summation with dense vector representations of mask prompt by sampling, whereas positional encodings of visual elements are concatenated with semantic token sequences before joint processing through transformer layers. The hierarchical feature aggregation is achieved through a U-Net-style architecture enhanced with level-specific attention mechanisms to capture spatial domain detailed features. This core computational framework interfaces with parallel output heads: the mask reconstruction is refined through upsampling operators, while geometry quality evaluation (IoU) and object semantic confidence metrics are synchronously output via hypernetwork-based MLP branches.
  • Figure 4: The impact comparison of multi-spacecraft connected component analysis on actual segmentation effectiveness. It can be clearly observed that the use of the MS-CCA method achieves accurate target segmentation for multiple spacecraft.
  • Figure 5: The structural composition and transformation effects of the SDAT framework. Adopting a combination of multiple strategies enables the model to achieve strong robustness and generalization for imaging and target segmentation in space missions.
  • ...and 5 more figures