StableTrack: Stabilizing Multi-Object Tracking on Low-Frequency Detections
Matvei Shelukhan, Timur Mamedov, Karina Kvanchiani
TL;DR
StableTrack tackles the challenge of multi-object tracking under low-frequency detections by decoupling detection from association and introducing a robust two-stage cross-frame matching that leverages a Bbox-Based Distance (BBD) and intermediate-frame visual tracking to refine Kalman Filter predictions. The method integrates Forward VT and Backward VT to predict positions in an intermediate frame, extends the KF state, and employs a two-stage Hungarian-based association that first relies on BBD and appearance, then on IoU with stricter spatial constraints. Key contributions include the BBD formulation, the two-stage matching framework, and the integration of visual tracking to stabilize predictions, which together yield an $11.6\%$ HOTA improvement at $1$ Hz on MOT17-val and strong performance on MOT17, MOT20, and DanceTrack under full-frequency detections. These results demonstrate improved resilience to temporal gaps with practical implications for real-time MOT in resource-constrained environments.
Abstract
Multi-object tracking (MOT) is one of the most challenging tasks in computer vision, where it is important to correctly detect objects and associate these detections across frames. Current approaches mainly focus on tracking objects in each frame of a video stream, making it almost impossible to run the model under conditions of limited computing resources. To address this issue, we propose StableTrack, a novel approach that stabilizes the quality of tracking on low-frequency detections. Our method introduces a new two-stage matching strategy to improve the cross-frame association between low-frequency detections. We propose a novel Bbox-Based Distance instead of the conventional Mahalanobis distance, which allows us to effectively match objects using the Re-ID model. Furthermore, we integrate visual tracking into the Kalman Filter and the overall tracking pipeline. Our method outperforms current state-of-the-art trackers in the case of low-frequency detections, achieving $\textit{11.6%}$ HOTA improvement at $\textit{1}$ Hz on MOT17-val, while keeping up with the best approaches on the standard MOT17, MOT20, and DanceTrack benchmarks with full-frequency detections.
