Annotation Free Spacecraft Detection and Segmentation using Vision Language Models

Samet Hicsonmez; Jose Sosa; Dan Pineau; Inder Pal Singh; Arunkumar Rathinam; Abd El Rahman Shabayek; Djamila Aouada

Annotation Free Spacecraft Detection and Segmentation using Vision Language Models

Samet Hicsonmez, Jose Sosa, Dan Pineau, Inder Pal Singh, Arunkumar Rathinam, Abd El Rahman Shabayek, Djamila Aouada

TL;DR

The paper tackles the challenge of detecting and segmenting spacecraft under space conditions with limited labeled data. It introduces an annotation-free pipeline that uses pre-trained Vision Language Models to generate pseudo-labels from a small unlabeled real dataset, refines these labels via test-time augmentation and fusion, and trains lightweight detectors through a teacher–student distillation scheme. Across SPARK-2024, SPEED+, and TANGO, the method consistently improves zero-shot VLM performance, achieving substantial gains in AP and AP75 and approaching supervised upper bounds. This approach drastically reduces labeling costs while delivering real-time capable detectors, with code and models publicly available to facilitate deployment in Space Situational Awareness tasks.

Abstract

Vision Language Models (VLMs) have demonstrated remarkable performance in open-world zero-shot visual recognition. However, their potential in space-related applications remains largely unexplored. In the space domain, accurate manual annotation is particularly challenging due to factors such as low visibility, illumination variations, and object blending with planetary backgrounds. Developing methods that can detect and segment spacecraft and orbital targets without requiring extensive manual labeling is therefore of critical importance. In this work, we propose an annotation-free detection and segmentation pipeline for space targets using VLMs. Our approach begins by automatically generating pseudo-labels for a small subset of unlabeled real data with a pre-trained VLM. These pseudo-labels are then leveraged in a teacher-student label distillation framework to train lightweight models. Despite the inherent noise in the pseudo-labels, the distillation process leads to substantial performance gains over direct zero-shot VLM inference. Experimental evaluations on the SPARK-2024, SPEED+, and TANGO datasets on segmentation tasks demonstrate consistent improvements in average precision (AP) by up to 10 points. Code and models are available at https://github.com/giddyyupp/annotation-free-spacecraft-segmentation.

Annotation Free Spacecraft Detection and Segmentation using Vision Language Models

TL;DR

Abstract

Paper Structure (21 sections, 8 equations, 3 figures, 3 tables)

This paper contains 21 sections, 8 equations, 3 figures, 3 tables.

INTRODUCTION
RELATED WORK
Vision Language Models
Test-time Augmentation (TTA)
Label Distillation
METHOD
Pseudo-label generation
Label refinement
Refinement using TTA
Refinement using confidence based thresholding
Label distillation
Inference
EXPERIMENTS
Datasets
Metrics
...and 6 more sections

Figures (3)

Figure 1: We present challenging examples taken from SPEED+ speed (top) and TANGO tango (bottom) where annotation is difficult due to the cluttered background and low visibility. Our method detects and segments the spacecraft object with high precision in all cases. Zoom in for details.
Figure 2: The processing pipeline of our method. Given unlabeled training images, we first automatically annotate them using a pretrained VLM with a fixed prompt of "spacecraft". In the next stage, we refine these pseudo-annotated images leveraging test-time augmentations. The refined labels are distilled to a shallow Student Model which is employed for the inference.
Figure 3: Visual comparison of zero-shot VLM and our student model predictions. Our method greatly improves VLM predictions.

Annotation Free Spacecraft Detection and Segmentation using Vision Language Models

TL;DR

Abstract

Annotation Free Spacecraft Detection and Segmentation using Vision Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)