Table of Contents
Fetching ...

FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Rongzihan Song, Zhenyu Weng, Huiping Zhuang, Jinchang Ren, Yongming Chen, Zhiping Lin

TL;DR

The paper tackles long-term occlusion in multiple object tracking by introducing FACT, a framework that learns appearance features online from all past tracking information. Central to FACT is the FAC module, which is trained online via an analytic continual-learning methodology to recursively update feature representations, paired with a two-stage association to robustly initialize new tracks. Experiments on MOT17 and MOT20 show state-of-the-art online performance, with FACT-compatible trackers achieving improved IDF1 and HOTA while maintaining real-time efficiency on capable GPUs. This approach offers a scalable path to improved robustness in occlusion-heavy MOT scenarios by leveraging complete historical tracking data during online operation.

Abstract

Multiple object tracking (MOT) involves identifying multiple targets and assigning them corresponding IDs within a video sequence, where occlusions are often encountered. Recent methods address occlusions using appearance cues through online learning techniques to improve adaptivity or offline learning techniques to utilize temporal information from videos. However, most existing online learning-based MOT methods are unable to learn from all past tracking information to improve adaptivity on long-term occlusions while maintaining real-time tracking speed. On the other hand, temporal information-based offline learning methods maintain a long-term memory to store past tracking information, but this approach restricts them to use only local past information during tracking. To address these challenges, we propose a new MOT framework called the Feature Adaptive Continual-learning Tracker (FACT), which enables real-time tracking and feature learning for targets by utilizing all past tracking information. We demonstrate that the framework can be integrated with various state-of-the-art feature-based trackers, thereby improving their tracking ability. Specifically, we develop the feature adaptive continual-learning (FAC) module, a neural network that can be trained online to learn features adaptively using all past tracking information during tracking. Moreover, we also introduce a two-stage association module specifically designed for the proposed continual learning-based tracking. Extensive experiment results demonstrate that the proposed method achieves state-of-the-art online tracking performance on MOT17 and MOT20 benchmarks. The code will be released upon acceptance.

FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

TL;DR

The paper tackles long-term occlusion in multiple object tracking by introducing FACT, a framework that learns appearance features online from all past tracking information. Central to FACT is the FAC module, which is trained online via an analytic continual-learning methodology to recursively update feature representations, paired with a two-stage association to robustly initialize new tracks. Experiments on MOT17 and MOT20 show state-of-the-art online performance, with FACT-compatible trackers achieving improved IDF1 and HOTA while maintaining real-time efficiency on capable GPUs. This approach offers a scalable path to improved robustness in occlusion-heavy MOT scenarios by leveraging complete historical tracking data during online operation.

Abstract

Multiple object tracking (MOT) involves identifying multiple targets and assigning them corresponding IDs within a video sequence, where occlusions are often encountered. Recent methods address occlusions using appearance cues through online learning techniques to improve adaptivity or offline learning techniques to utilize temporal information from videos. However, most existing online learning-based MOT methods are unable to learn from all past tracking information to improve adaptivity on long-term occlusions while maintaining real-time tracking speed. On the other hand, temporal information-based offline learning methods maintain a long-term memory to store past tracking information, but this approach restricts them to use only local past information during tracking. To address these challenges, we propose a new MOT framework called the Feature Adaptive Continual-learning Tracker (FACT), which enables real-time tracking and feature learning for targets by utilizing all past tracking information. We demonstrate that the framework can be integrated with various state-of-the-art feature-based trackers, thereby improving their tracking ability. Specifically, we develop the feature adaptive continual-learning (FAC) module, a neural network that can be trained online to learn features adaptively using all past tracking information during tracking. Moreover, we also introduce a two-stage association module specifically designed for the proposed continual learning-based tracking. Extensive experiment results demonstrate that the proposed method achieves state-of-the-art online tracking performance on MOT17 and MOT20 benchmarks. The code will be released upon acceptance.
Paper Structure (21 sections, 14 equations, 5 figures, 7 tables)

This paper contains 21 sections, 14 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Proposed FACT framework. The FACT framework consists of three modules: the detection and feature extraction module, the FAC module, and the association module. Assume we are given any frame $k$ of a video, initially, the detection and feature extraction module identifies targets and converts their visual appearances into feature embeddings. Subsequently, these embeddings are processed by the FAC module. The output of the FAC module is then processed by the association module to associate the detected targets with their respective tracks, producing the association result. Finally, the FAC module performs online training using the embeddings, association results, and module parameters inherited from the last frame.
  • Figure 2: The FAC module and online training process. The FAC module consists of an embedding transformation (ET) layer and a fully-connected network (FCN) layer. Using the feature embeddings $\mathbf{X}_{k}^{\left (\textup{ReID}\right )}$ and corresponding association results from frame $k$, we update the FCN layer through online training. This training utilizes the feature embeddings, association results, FCN layer weight inherited from the last frame, and the feature autocorrelation unit, which is encrypted with both the current and past feature information.
  • Figure 3: Comparison of occlusion adaptation: FAC module-based affinity estimation vs. cosine distance.The top left large image: Tracking results for frame 623, showing tracked objects in various colors with FAC module. The top right smaller image sequence: The occluded target image in the subsequent frames. The bottom table: Comparison of the distances in subsequent frames between the FAC module (continual learning-based) and the popular cosine distance association (without continual learning) for selected identities. The focus is on ID: 6, the individual dressed in pink who is partially occluded and positioned behind others. Our method successfully tracks the partially occluded target, whereas the cosine distance does not.
  • Figure 4: Distance distribution of the associated and unassociated pairs on two videos from MOT17.
  • Figure 5: Visualization of sample tracking results. We select 3 sequences from the validation set of the MOT17. The yellow value under each box represents using the cosine distance association and the corresponding results, the blue value represents using the affinity estimation and the corresponding results.