Table of Contents
Fetching ...

Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking

Yunhao Li, Yifan Jiao, Dan Meng, Heng Fan, Libo Zhang

TL;DR

This paper tackles open-vocabulary multi-object tracking by arguing that trajectory information has been underutilized in OV-MOT. It introduces TRACT, a two-stage tracker with Trajectory Consistency Reinforcement (TCR) for robust association and TraCLIP for trajectory-assisted classification via Trajectory Feature Aggregation (TFA) and Trajectory Semantic Enrichment (TSE). The approach uses a memory-based association scheme with category voting and a CLIP-based, LLM-augmented classification pipeline, trained on base categories but evaluated on open vocabulary, achieving state-of-the-art results on OV-TAO. The findings underscore the value of trajectory context for both maintaining target identity and improving classification under occlusion and novel class conditions, with practical impact for more robust OV-MOT systems.

Abstract

Open-Vocabulary Multi-Object Tracking (OV-MOT) aims to enable approaches to track objects without being limited to a predefined set of categories. Current OV-MOT methods typically rely primarily on instance-level detection and association, often overlooking trajectory information that is unique and essential for object tracking tasks. Utilizing trajectory information can enhance association stability and classification accuracy, especially in cases of occlusion and category ambiguity, thereby improving adaptability to novel classes. Thus motivated, in this paper we propose \textbf{TRACT}, an open-vocabulary tracker that leverages trajectory information to improve both object association and classification in OV-MOT. Specifically, we introduce a \textit{Trajectory Consistency Reinforcement} (\textbf{TCR}) strategy, that benefits tracking performance by improving target identity and category consistency. In addition, we present \textbf{TraCLIP}, a plug-and-play trajectory classification module. It integrates \textit{Trajectory Feature Aggregation} (\textbf{TFA}) and \textit{Trajectory Semantic Enrichment} (\textbf{TSE}) strategies to fully leverage trajectory information from visual and language perspectives for enhancing the classification results. Extensive experiments on OV-TAO show that our TRACT significantly improves tracking performance, highlighting trajectory information as a valuable asset for OV-MOT. Code will be released.

Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking

TL;DR

This paper tackles open-vocabulary multi-object tracking by arguing that trajectory information has been underutilized in OV-MOT. It introduces TRACT, a two-stage tracker with Trajectory Consistency Reinforcement (TCR) for robust association and TraCLIP for trajectory-assisted classification via Trajectory Feature Aggregation (TFA) and Trajectory Semantic Enrichment (TSE). The approach uses a memory-based association scheme with category voting and a CLIP-based, LLM-augmented classification pipeline, trained on base categories but evaluated on open vocabulary, achieving state-of-the-art results on OV-TAO. The findings underscore the value of trajectory context for both maintaining target identity and improving classification under occlusion and novel class conditions, with practical impact for more robust OV-MOT systems.

Abstract

Open-Vocabulary Multi-Object Tracking (OV-MOT) aims to enable approaches to track objects without being limited to a predefined set of categories. Current OV-MOT methods typically rely primarily on instance-level detection and association, often overlooking trajectory information that is unique and essential for object tracking tasks. Utilizing trajectory information can enhance association stability and classification accuracy, especially in cases of occlusion and category ambiguity, thereby improving adaptability to novel classes. Thus motivated, in this paper we propose \textbf{TRACT}, an open-vocabulary tracker that leverages trajectory information to improve both object association and classification in OV-MOT. Specifically, we introduce a \textit{Trajectory Consistency Reinforcement} (\textbf{TCR}) strategy, that benefits tracking performance by improving target identity and category consistency. In addition, we present \textbf{TraCLIP}, a plug-and-play trajectory classification module. It integrates \textit{Trajectory Feature Aggregation} (\textbf{TFA}) and \textit{Trajectory Semantic Enrichment} (\textbf{TSE}) strategies to fully leverage trajectory information from visual and language perspectives for enhancing the classification results. Extensive experiments on OV-TAO show that our TRACT significantly improves tracking performance, highlighting trajectory information as a valuable asset for OV-MOT. Code will be released.

Paper Structure

This paper contains 18 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Trajectory information can enhance both association and classification by helping to recover associations disrupted by inaccurate or missed detections (as shown in (a)) and by correcting incorrect classifications (as shown in (b)).
  • Figure 2: Comparison of the overall pipeline between existing OV-MOT approaches and our TRACT. We introduce three strategies, i.e., TCR, TFA, and TSE strategies, to utilize trajectory information in association and classification.
  • Figure 3: The overall architecture of the proposed TRACT. A replaceable open-vocabulary detector is used to generate boxes of arbitrary categories, and these detection results are used for trajectory association. TRACT leverages trajectory information in both the trajectory-enhanced association and trajectory-assisted classification steps.
  • Figure 4: The architecture of the proposed TraCLIP. It approaches both RGB]255,218,185language and RGB]184,232,156visual aspects, making full use of trajectory information to assist classification.
  • Figure 5: Comparison of different fusion mechanisms on the validation set of OV-TAO li2023ovtrack, using TETA (a) and ClsA (b) metrics.