Table of Contents
Fetching ...

MotionTrack: Learning Motion Predictor for Multiple Object Tracking

Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao

TL;DR

This work tackles multi-object tracking under challenging nonlinear motion and appearance similarity by learning a motion predictor that relies only on object trajectories. It introduces MotionTrack, which uses a Transformer encoder to model long-range trajectory dynamics and a Dynamic MLP to fuse channel-level information, forming a Dual-granularity Information Fusion. The method is trained with motion-focused augmentations and optimized by a smooth L1 loss, and it achieves state-of-the-art results on DanceTrack and strong performance on SportsMOT, highlighting the value of trajectory-driven motion modeling for robust association. The approach offers a simple yet effective online tracker that improves motion-based data association in complex scenes, with practical implications for surveillance, sports analytics, and autonomous systems.

Abstract

Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion.

MotionTrack: Learning Motion Predictor for Multiple Object Tracking

TL;DR

This work tackles multi-object tracking under challenging nonlinear motion and appearance similarity by learning a motion predictor that relies only on object trajectories. It introduces MotionTrack, which uses a Transformer encoder to model long-range trajectory dynamics and a Dynamic MLP to fuse channel-level information, forming a Dual-granularity Information Fusion. The method is trained with motion-focused augmentations and optimized by a smooth L1 loss, and it achieves state-of-the-art results on DanceTrack and strong performance on SportsMOT, highlighting the value of trajectory-driven motion modeling for robust association. The approach offers a simple yet effective online tracker that improves motion-based data association in complex scenes, with practical implications for surveillance, sports analytics, and autonomous systems.

Abstract

Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion.
Paper Structure (18 sections, 10 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 10 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: A qualitative comparison between the proposed tracker and OC_SORT is presented in a typical nonlinear motion scene. Samples were extracted from frames 87, 126, 128, and 132 of the video Dancetrack0058. In the sequence, as the black-clad dancer turns around and crosses paths with the red-haired dancer, OC_SORT experiences ID switches (4 $\xrightarrow{}$ 3), while our tracker successfully continues tracking.
  • Figure 2: An overview of the proposed method. The proposed motion predictor $\mathcal{MP}$ considers at most $n_{past}$ of the historical observations of its trajectory when predicting the object position. With predicted bounding boxes $\hat{\bold{D}}_t$, data association can be achieved by the linear solver, Hungarian algorithm, based solely on their spatial similarity to the current frame detection results $\bold{D}_t$. Blank boxes represent missing observations and dashed boxes represent predicted bounding boxes. Different colors represent different objects.
  • Figure 3: The network structure of dynamic MLP is shown in (a), and the dynamic FC operation process is shown in (b).
  • Figure 4: The architecture of the proposed motion predictor.
  • Figure 5: Qualitative results of our method on SportsMOT. Different colored bounding boxes indicate different identity. Best viewed in color and zoom in.
  • ...and 2 more figures