SFSORT: Scene Features-based Simple Online Real-Time Tracker
M. M. Morsali, Z. Sharifi, F. Fallah, S. Hashembeiki, H. Mohammadzade, S. Bagheri Shouraki
TL;DR
SFSORT addresses the need for real-time, accurate multi-object tracking in tracking-by-detection frameworks. It replaces traditional motion models with the Bounding Box Similarity Index ($BBSI$) and introduces scene-feature driven hyperparameter adaptation, a two-stage association, and a camera-motion/depth-aware post-processing pipeline. The key contributions include $BBSI$ for non-overlapping and overlapping box associations, adaptive hyperparameters linked to frame rate and scene depth, and a depth-score-based post-processing scheme that preserves identities while maintaining speed. Empirically, SFSORT achieves state-of-the-art real-time performance on MOT17 and MOT20, with $HOTA$ up to $61.7\%$ at $2242$ Hz on MOT17 and $60.9\%$ at $304$ Hz on MOT20, demonstrating that high accuracy can coexist with extreme speed and privacy-preserving operation.
Abstract
This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similarity Index, this work eliminates the Kalman Filter, leading to reduced computational requirements. Additionally, this paper demonstrates the impact of scene features on enhancing object-track association and improving track post-processing. Using a 2.2 GHz Intel Xeon CPU, the proposed method achieves an HOTA of 61.7\% with a processing speed of 2242 Hz on the MOT17 dataset and an HOTA of 60.9\% with a processing speed of 304 Hz on the MOT20 dataset. The tracker's source code, fine-tuned object detection model, and tutorials are available at \url{https://github.com/gitmehrdad/SFSORT}.
