Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li, Yifan Jiao, Dan Meng, Heng Fan, Libo Zhang
TL;DR
This paper tackles open-vocabulary multi-object tracking by arguing that trajectory information has been underutilized in OV-MOT. It introduces TRACT, a two-stage tracker with Trajectory Consistency Reinforcement (TCR) for robust association and TraCLIP for trajectory-assisted classification via Trajectory Feature Aggregation (TFA) and Trajectory Semantic Enrichment (TSE). The approach uses a memory-based association scheme with category voting and a CLIP-based, LLM-augmented classification pipeline, trained on base categories but evaluated on open vocabulary, achieving state-of-the-art results on OV-TAO. The findings underscore the value of trajectory context for both maintaining target identity and improving classification under occlusion and novel class conditions, with practical impact for more robust OV-MOT systems.
Abstract
Open-Vocabulary Multi-Object Tracking (OV-MOT) aims to enable approaches to track objects without being limited to a predefined set of categories. Current OV-MOT methods typically rely primarily on instance-level detection and association, often overlooking trajectory information that is unique and essential for object tracking tasks. Utilizing trajectory information can enhance association stability and classification accuracy, especially in cases of occlusion and category ambiguity, thereby improving adaptability to novel classes. Thus motivated, in this paper we propose \textbf{TRACT}, an open-vocabulary tracker that leverages trajectory information to improve both object association and classification in OV-MOT. Specifically, we introduce a \textit{Trajectory Consistency Reinforcement} (\textbf{TCR}) strategy, that benefits tracking performance by improving target identity and category consistency. In addition, we present \textbf{TraCLIP}, a plug-and-play trajectory classification module. It integrates \textit{Trajectory Feature Aggregation} (\textbf{TFA}) and \textit{Trajectory Semantic Enrichment} (\textbf{TSE}) strategies to fully leverage trajectory information from visual and language perspectives for enhancing the classification results. Extensive experiments on OV-TAO show that our TRACT significantly improves tracking performance, highlighting trajectory information as a valuable asset for OV-MOT. Code will be released.
