Table of Contents
Fetching ...

Dual-level Adaptation for Multi-Object Tracking: Building Test-Time Calibration from Experience and Intuition

Wen Guo, Pengfei Zhao, Zongmeng Wang, Yufan Hu, Junyu Gao

Abstract

Multiple Object Tracking (MOT) has long been a fundamental task in computer vision, with broad applications in various real-world scenarios. However, due to distribution shifts in appearance, motion pattern, and catagory between the training and testing data, model performance degrades considerably during online inference in MOT. Test-Time Adaptation (TTA) has emerged as a promising paradigm to alleviate such distribution shifts. However, existing TTA methods often fail to deliver satisfactory results in MOT, as they primarily focus solely on frame-level adaptation while neglecting temporal consistency and identity association across frames and videos. Inspired by human decision-making process, this paper propose a Test-time Calibration from Experience and Intuition (TCEI) framework. In this framework, the Intuitive system utilizes transient memory to recall recently observed objects for rapid predictions, while the Experiential system leverages the accumulated experience from prior test videos to reassess and calibrate these intuitive predictions. Furthermore, both confident and uncertain objects during online testing are exploited as historical priors and reflective cases, respectively, enabling the model to adapt to the testing environment and alleviate performance degradation. Extensive experiments demonstrate that the proposed TCEI framework consistently achieves superior performance across multiple benchmark datasets and significantly enhances the model's adaptability under distribution shifts. The code will be released at https://github.com/1941Zpf/TCEI.

Dual-level Adaptation for Multi-Object Tracking: Building Test-Time Calibration from Experience and Intuition

Abstract

Multiple Object Tracking (MOT) has long been a fundamental task in computer vision, with broad applications in various real-world scenarios. However, due to distribution shifts in appearance, motion pattern, and catagory between the training and testing data, model performance degrades considerably during online inference in MOT. Test-Time Adaptation (TTA) has emerged as a promising paradigm to alleviate such distribution shifts. However, existing TTA methods often fail to deliver satisfactory results in MOT, as they primarily focus solely on frame-level adaptation while neglecting temporal consistency and identity association across frames and videos. Inspired by human decision-making process, this paper propose a Test-time Calibration from Experience and Intuition (TCEI) framework. In this framework, the Intuitive system utilizes transient memory to recall recently observed objects for rapid predictions, while the Experiential system leverages the accumulated experience from prior test videos to reassess and calibrate these intuitive predictions. Furthermore, both confident and uncertain objects during online testing are exploited as historical priors and reflective cases, respectively, enabling the model to adapt to the testing environment and alleviate performance degradation. Extensive experiments demonstrate that the proposed TCEI framework consistently achieves superior performance across multiple benchmark datasets and significantly enhances the model's adaptability under distribution shifts. The code will be released at https://github.com/1941Zpf/TCEI.
Paper Structure (12 sections, 8 equations, 3 figures, 6 tables)

This paper contains 12 sections, 8 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Illustration of the proposed TCEI framework. (a) The upper part illustrates MOT under distribution shift. Due to significant discrepancies between the training and testing domains, the baseline model produces incorrect ID predictions. (b) The lower part presents our TCEI framework. The Intuitive System exploits transient memory from recently observed objects to provide rapid test-time guidance, and the Experiential System utilizes accumulated historical experience to calibrate these intuitive predictions.
  • Figure 2: Overview of the proposed Test-time Calibration from Experience and Intuition (TCEI) framework. The Intuitive system performs rapid inference using transient memory, while the Experiential system refines predictions with historical test experience. Confident and uncertain objects are stored in caches to provide temporal priors and reflective cues. “Exp. Embeds” denotes the experience embeddings, while “Query” represents the query embeddings of the Transformer decoder. The experience embeddings evolve along with the query embeddings to capture object-specific characteristics.
  • Figure 3: Analysis of the maximum capacity of the confident and uncertain objects on the DanceTrack dataset.