MotionTrack: Learning Motion Predictor for Multiple Object Tracking
Changcheng Xiao, Qiong Cao, Yujie Zhong, Long Lan, Xiang Zhang, Zhigang Luo, Dacheng Tao
TL;DR
This work tackles multi-object tracking under challenging nonlinear motion and appearance similarity by learning a motion predictor that relies only on object trajectories. It introduces MotionTrack, which uses a Transformer encoder to model long-range trajectory dynamics and a Dynamic MLP to fuse channel-level information, forming a Dual-granularity Information Fusion. The method is trained with motion-focused augmentations and optimized by a smooth L1 loss, and it achieves state-of-the-art results on DanceTrack and strong performance on SportsMOT, highlighting the value of trajectory-driven motion modeling for robust association. The approach offers a simple yet effective online tracker that improves motion-based data association in complex scenes, with practical implications for surveillance, sports analytics, and autonomous systems.
Abstract
Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion.
