Table of Contents
Fetching ...

SFSORT: Scene Features-based Simple Online Real-Time Tracker

M. M. Morsali, Z. Sharifi, F. Fallah, S. Hashembeiki, H. Mohammadzade, S. Bagheri Shouraki

TL;DR

SFSORT addresses the need for real-time, accurate multi-object tracking in tracking-by-detection frameworks. It replaces traditional motion models with the Bounding Box Similarity Index ($BBSI$) and introduces scene-feature driven hyperparameter adaptation, a two-stage association, and a camera-motion/depth-aware post-processing pipeline. The key contributions include $BBSI$ for non-overlapping and overlapping box associations, adaptive hyperparameters linked to frame rate and scene depth, and a depth-score-based post-processing scheme that preserves identities while maintaining speed. Empirically, SFSORT achieves state-of-the-art real-time performance on MOT17 and MOT20, with $HOTA$ up to $61.7\%$ at $2242$ Hz on MOT17 and $60.9\%$ at $304$ Hz on MOT20, demonstrating that high accuracy can coexist with extreme speed and privacy-preserving operation.

Abstract

This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similarity Index, this work eliminates the Kalman Filter, leading to reduced computational requirements. Additionally, this paper demonstrates the impact of scene features on enhancing object-track association and improving track post-processing. Using a 2.2 GHz Intel Xeon CPU, the proposed method achieves an HOTA of 61.7\% with a processing speed of 2242 Hz on the MOT17 dataset and an HOTA of 60.9\% with a processing speed of 304 Hz on the MOT20 dataset. The tracker's source code, fine-tuned object detection model, and tutorials are available at \url{https://github.com/gitmehrdad/SFSORT}.

SFSORT: Scene Features-based Simple Online Real-Time Tracker

TL;DR

SFSORT addresses the need for real-time, accurate multi-object tracking in tracking-by-detection frameworks. It replaces traditional motion models with the Bounding Box Similarity Index () and introduces scene-feature driven hyperparameter adaptation, a two-stage association, and a camera-motion/depth-aware post-processing pipeline. The key contributions include for non-overlapping and overlapping box associations, adaptive hyperparameters linked to frame rate and scene depth, and a depth-score-based post-processing scheme that preserves identities while maintaining speed. Empirically, SFSORT achieves state-of-the-art real-time performance on MOT17 and MOT20, with up to at Hz on MOT17 and at Hz on MOT20, demonstrating that high accuracy can coexist with extreme speed and privacy-preserving operation.

Abstract

This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similarity Index, this work eliminates the Kalman Filter, leading to reduced computational requirements. Additionally, this paper demonstrates the impact of scene features on enhancing object-track association and improving track post-processing. Using a 2.2 GHz Intel Xeon CPU, the proposed method achieves an HOTA of 61.7\% with a processing speed of 2242 Hz on the MOT17 dataset and an HOTA of 60.9\% with a processing speed of 304 Hz on the MOT20 dataset. The tracker's source code, fine-tuned object detection model, and tutorials are available at \url{https://github.com/gitmehrdad/SFSORT}.
Paper Structure (22 sections, 18 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 18 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: The Proposed Multi-object Tracking System.
  • Figure 2: Comparison of Various Similarity Descriptors in the Association Problem. (a) IoU vs. GIoU giou. (b) GIoU vs. DIoUdiou. (c) DIoU vs. EIoUeiou.
  • Figure 3: Comparison of the BBSI with Other Similarity Descriptors. (a) BBSI vs. EIoUeiou. (b) BBSI vs. DIoUdiou.
  • Figure 4: The Visualization of Calculation Details in the BBSI.
  • Figure 5: The Visualization of Keypoints with Small Displacement as a Key Factor in the Proposed Camera Motion Detection. (a) The scarcity of keypoints when the camera is moving. (b) The abundance of keypoints when the camera is fixed.
  • ...and 3 more figures