MambaMOT: State-Space Model as Motion Predictor for Multi-Object Tracking
Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
TL;DR
This work addresses the inadequacy of Kalman-filter-based motion models for multi-object tracking in nonlinear, occlusion-rich scenarios by introducing MambaMOT, an online MOT approach built on the efficient state-space model Mamba. MambaMOT predicts next tracklet locations with a Mamba-based motion block and a dedicated prediction head, and extends to MambaMOT+ by extracting trajectory embeddings to enable tracklet merging with reduced computational cost. Across DanceTrack and SportsMOT, MambaMOT and especially MambaMOT+ achieve substantial gains in HOTA and IDF1, while maintaining real-time speeds (~28.8 FPS) on a single RTX 4080, demonstrating practical viability in complex motion regimes. The results establish that learning-based motion modeling with trajectory-aware merging can surpass Kalman-filter-based approaches in robustness and efficiency for MOT in dynamic environments.
Abstract
In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman filter for motion prediction, leveraging its strengths in linear motion scenarios. However, the inherent limitations of these methods become evident when confronted with complex, nonlinear motions and occlusions prevalent in dynamic environments like sports and dance. This paper explores the possibilities of replacing the Kalman filter with a learning-based motion model that effectively enhances tracking accuracy and adaptability beyond the constraints of Kalman filter-based tracker. In this paper, our proposed method MambaMOT and MambaMOT+, demonstrate advanced performance on challenging MOT datasets such as DanceTrack and SportsMOT, showcasing their ability to handle intricate, non-linear motion patterns and frequent occlusions more effectively than traditional methods.
