Addressing single object tracking in satellite imagery through prompt-engineered solutions
Athena Psalta, Vasileios Tsironis, Andreas El Saer, Konstantinos Karantzalos
TL;DR
This work tackles single-object tracking in satellite videos, a domain plagued by background variability and low-resolution targets. It introduces a training-free, prompt-engineered framework that fuses the segmentation strengths of the Segment Anything Model (SAM) with the point-based tracking capabilities of TAPIR, operating on keyframes and updated points through per-frame initialization and temporal refinement. The method achieves competitive results on the VISO dataset (e.g., $DPR=63.9$, $OSR=36.5$) without any model training, underscoring the potential of prompt-driven strategies for remote sensing applications. Overall, the proposed SAM+TAPIR pipeline demonstrates robust adaptability to satellite imagery challenges and offers a practical, training-free alternative for SOT in remote sensing contexts.
Abstract
Object tracking in satellite videos remains a complex endeavor in remote sensing due to the intricate and dynamic nature of satellite imagery. Existing state-of-the-art trackers in computer vision integrate sophisticated architectures, attention mechanisms, and multi-modal fusion to enhance tracking accuracy across diverse environments. However, the challenges posed by satellite imagery, such as background variations, atmospheric disturbances, and low-resolution object delineation, significantly impede the precision and reliability of traditional Single Object Tracking (SOT) techniques. Our study delves into these challenges and proposes prompt engineering methodologies, leveraging the Segment Anything Model (SAM) and TAPIR (Tracking Any Point with per-frame Initialization and temporal Refinement), to create a training-free point-based tracking method for small-scale objects on satellite videos. Experiments on the VISO dataset validate our strategy, marking a significant advancement in robust tracking solutions tailored for satellite imagery in remote sensing applications.
