Table of Contents
Fetching ...

Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

Mingzhan Yang, Guangxin Han, Bin Yan, Wenhua Zhang, Jinqing Qi, Huchuan Lu, Dong Wang

TL;DR

Hybrid-SORT tackles multi-object tracking under occlusion and clustering by augmenting traditional strong cues (spatial and appearance) with weak cues: confidence state, height state, and velocity direction. It introduces Tracklet Confidence Modeling (TCM) and Height Modulated IoU (HMIoU) and strengthens motion cues with Robust Observation-Centric Momentum (ROCM), all designed to preserve online, real-time performance. The method is plug-and-play and training-free, generalizing across multiple trackers and benchmarks, with notable gains on DanceTrack, MOT17, and MOT20, and gains amplified when combined with an appearance model (Hybrid-SORT-ReID). The findings highlight the practical value of weak cues for robust association in challenging MOT scenarios, offering a scalable, efficient path to improved tracking in occluded and clustered environments.

Abstract

Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, spatial and appearance information will become ambiguous simultaneously due to the high overlap among objects. In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues to compensate for strong cues. Along with velocity direction, we introduce the confidence and height state as potential weak cues. With superior performance, our method still maintains Simple, Online and Real-Time (SORT) characteristics. Also, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner. Significant and consistent improvements are observed when applying our method to 5 different representative trackers. Further, with both strong and weak cues, our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack where interaction and severe occlusion frequently happen with complex motions. The code and models are available at https://github.com/ymzis69/HybridSORT.

Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

TL;DR

Hybrid-SORT tackles multi-object tracking under occlusion and clustering by augmenting traditional strong cues (spatial and appearance) with weak cues: confidence state, height state, and velocity direction. It introduces Tracklet Confidence Modeling (TCM) and Height Modulated IoU (HMIoU) and strengthens motion cues with Robust Observation-Centric Momentum (ROCM), all designed to preserve online, real-time performance. The method is plug-and-play and training-free, generalizing across multiple trackers and benchmarks, with notable gains on DanceTrack, MOT17, and MOT20, and gains amplified when combined with an appearance model (Hybrid-SORT-ReID). The findings highlight the practical value of weak cues for robust association in challenging MOT scenarios, offering a scalable, efficient path to improved tracking in occluded and clustered environments.

Abstract

Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, spatial and appearance information will become ambiguous simultaneously due to the high overlap among objects. In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues to compensate for strong cues. Along with velocity direction, we introduce the confidence and height state as potential weak cues. With superior performance, our method still maintains Simple, Online and Real-Time (SORT) characteristics. Also, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner. Significant and consistent improvements are observed when applying our method to 5 different representative trackers. Further, with both strong and weak cues, our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack where interaction and severe occlusion frequently happen with complex motions. The code and models are available at https://github.com/ymzis69/HybridSORT.
Paper Structure (31 sections, 10 equations, 4 figures, 8 tables)

This paper contains 31 sections, 10 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The discrimination capacity of strong and weak cues. Green solid arrows represents reliable discrimination between pairwise objects, while red dashed arrows indicate unreliable discrimination. The higher the value of the arrow, the more reliable the discrimination is.
  • Figure 2: Pipeline of Hybrid-SORT and Hybrid-SORT-ReID. For strong cues, we utilize IoU as the metric for spatial information, and utilize cosine distance for appearance features. For weak cues, we incorporate the confidence state, height state, and velocity direction. Velocity direction is illustrated by centers instead of corners for better clarity.
  • Figure 3: The confidence curve of an object. Kalman Filter estimation lags behind the actual confidence during occlusion while Linear Prediction performs effectively.
  • Figure 4: Velocity direction of the center and corners. While the velocity direction of some corners maintains high similarity, the direction of the center is completely opposite.