Table of Contents
Fetching ...

History-Aware Transformation of ReID Features for Multiple Object Tracking

Ruopeng Gao, Yuyao Wang, Chunxu Liu, Limin Wang

TL;DR

This work tackles multi-object tracking by arguing that generic ReID features are suboptimal for distinguishing similar targets within a single video sequence. It introduces a training-free History-Aware Projection using Fisher Linear Discriminant to compute a per-sequence projection $W$, transforming features via $\boldsymbol{f}'=\boldsymbol{f}W$ and selecting a reduced dimension $D'$. It further enhances robustness with a Temporal-Shifted Trajectory Centroid that emphasizes recent observations, and combines the transformed and original feature spaces through Knowledge Integration using $\cos(i,j)=\alpha\cos(\boldsymbol{f}_i',\hat{\boldsymbol{f}}_j')+(1-\alpha)\cos(\boldsymbol{f}_i,\hat{\boldsymbol{f}}_j)$. Extensive experiments on DanceTrack, MOT17/16, SportsMOT, and TAO show significant performance gains, including strong zero-shot transfer, demonstrating the value of sequence-tailored ReID representations for MOT.

Abstract

The aim of multiple object tracking (MOT) is to detect all objects in a video and bind them into multiple trajectories. Generally, this process is carried out in two steps: detecting objects and associating them across frames based on various cues and metrics. Many studies and applications adopt object appearance, also known as re-identification (ReID) features, for target matching through straightforward similarity calculation. However, we argue that this practice is overly naive and thus overlooks the unique characteristics of MOT tasks. Unlike regular re-identification tasks that strive to distinguish all potential targets in a general representation, multi-object tracking typically immerses itself in differentiating similar targets within the same video sequence. Therefore, we believe that seeking a more suitable feature representation space based on the different sample distributions of each sequence will enhance tracking performance. In this paper, we propose using history-aware transformations on ReID features to achieve more discriminative appearance representations. Specifically, we treat historical trajectory features as conditions and employ a tailored Fisher Linear Discriminant (FLD) to find a spatial projection matrix that maximizes the differentiation between different trajectories. Our extensive experiments reveal that this training-free projection can significantly boost feature-only trackers to achieve competitive, even superior tracking performance compared to state-of-the-art methods while also demonstrating impressive zero-shot transfer capabilities. This demonstrates the effectiveness of our proposal and further encourages future investigation into the importance and customization of ReID models in multiple object tracking. The code will be released at https://github.com/HELLORPG/HATReID-MOT.

History-Aware Transformation of ReID Features for Multiple Object Tracking

TL;DR

This work tackles multi-object tracking by arguing that generic ReID features are suboptimal for distinguishing similar targets within a single video sequence. It introduces a training-free History-Aware Projection using Fisher Linear Discriminant to compute a per-sequence projection , transforming features via and selecting a reduced dimension . It further enhances robustness with a Temporal-Shifted Trajectory Centroid that emphasizes recent observations, and combines the transformed and original feature spaces through Knowledge Integration using . Extensive experiments on DanceTrack, MOT17/16, SportsMOT, and TAO show significant performance gains, including strong zero-shot transfer, demonstrating the value of sequence-tailored ReID representations for MOT.

Abstract

The aim of multiple object tracking (MOT) is to detect all objects in a video and bind them into multiple trajectories. Generally, this process is carried out in two steps: detecting objects and associating them across frames based on various cues and metrics. Many studies and applications adopt object appearance, also known as re-identification (ReID) features, for target matching through straightforward similarity calculation. However, we argue that this practice is overly naive and thus overlooks the unique characteristics of MOT tasks. Unlike regular re-identification tasks that strive to distinguish all potential targets in a general representation, multi-object tracking typically immerses itself in differentiating similar targets within the same video sequence. Therefore, we believe that seeking a more suitable feature representation space based on the different sample distributions of each sequence will enhance tracking performance. In this paper, we propose using history-aware transformations on ReID features to achieve more discriminative appearance representations. Specifically, we treat historical trajectory features as conditions and employ a tailored Fisher Linear Discriminant (FLD) to find a spatial projection matrix that maximizes the differentiation between different trajectories. Our extensive experiments reveal that this training-free projection can significantly boost feature-only trackers to achieve competitive, even superior tracking performance compared to state-of-the-art methods while also demonstrating impressive zero-shot transfer capabilities. This demonstrates the effectiveness of our proposal and further encourages future investigation into the importance and customization of ReID models in multiple object tracking. The code will be released at https://github.com/HELLORPG/HATReID-MOT.

Paper Structure

This paper contains 22 sections, 13 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Visualization of re-identification features in the original representation space FastReID.
  • Figure 2: Overview of our pipeline. We use different colors to indicate different identities (trajectories). In the original space, some overly similar targets cannot be well distinguished, leading to issues in the current frame's matching process. Therefore, we treat the trajectory features as conditions and apply a modified Fisher Linear Discriminant to seek a more optimal space for distinguishing different trajectories. Finally, both original and transformed features are used to calculate the similarity matrix, balancing generalization and specialization.
  • Figure 3: Visualization ReID features in the original space and transformed space. $\bullet$ represents the historical features and indicates the current features. Compared to the other two spaces, the FLD-projected space shows better differentiation of trajectories.
  • Figure 4: More visualization of ReID features.$\bullet$ represents the historical features and indicates the current features. Compared to the other two spaces, the FLD-projected space shows better differentiation of trajectories.