Table of Contents
Fetching ...

PD-SORT: Occlusion-Robust Multi-Object Tracking Using Pseudo-Depth Cues

Yanchao Wang, Dawei Zhang, Run Li, Zhonglong Zheng, Minglu Li

TL;DR

PD-SORT addresses occlusion-induced identity errors in 2D MOT by integrating pseudo-depth into the Kalman filter as a state, and by designing Depth Volume IoU ($DVIoU$) and Quantized Pseudo-Depth Measurement ($QPDM$) to guide associations. It also uses Camera Motion Compensation ($CMC$) and an observation-centric recovery strategy, building on OC-SORT. Empirical results on DanceTrack, MOT17, and MOT20 show consistent gains over strong baselines, with notable improvements in occlusion-heavy sequences such as DanceTrack, while preserving online, real-time performance. This approach demonstrates the practicality of depth cues as a motion-state surrogate, enabling more robust tracking on consumer devices.

Abstract

Multi-object tracking (MOT) is a rising topic in video processing technologies and has important application value in consumer electronics. Currently, tracking-by-detection (TBD) is the dominant paradigm for MOT, which performs target detection and association frame by frame. However, the association performance of TBD methods degrades in complex scenes with heavy occlusions, which hinders the application of such methods in real-world scenarios.To this end, we incorporate pseudo-depth cues to enhance the association performance and propose Pseudo-Depth SORT (PD-SORT). First, we extend the Kalman filter state vector with pseudo-depth states. Second, we introduce a novel depth volume IoU (DVIoU) by combining the conventional 2D IoU with pseudo-depth. Furthermore, we develop a quantized pseudo-depth measurement (QPDM) strategy for more robust data association. Besides, we also integrate camera motion compensation (CMC) to handle dynamic camera situations. With the above designs, PD-SORT significantly alleviates the occlusion-induced ambiguous associations and achieves leading performances on DanceTrack, MOT17, and MOT20. Note that the improvement is especially obvious on DanceTrack, where objects show complex motions, similar appearances, and frequent occlusions. The code is available at https://github.com/Wangyc2000/PD_SORT.

PD-SORT: Occlusion-Robust Multi-Object Tracking Using Pseudo-Depth Cues

TL;DR

PD-SORT addresses occlusion-induced identity errors in 2D MOT by integrating pseudo-depth into the Kalman filter as a state, and by designing Depth Volume IoU () and Quantized Pseudo-Depth Measurement () to guide associations. It also uses Camera Motion Compensation () and an observation-centric recovery strategy, building on OC-SORT. Empirical results on DanceTrack, MOT17, and MOT20 show consistent gains over strong baselines, with notable improvements in occlusion-heavy sequences such as DanceTrack, while preserving online, real-time performance. This approach demonstrates the practicality of depth cues as a motion-state surrogate, enabling more robust tracking on consumer devices.

Abstract

Multi-object tracking (MOT) is a rising topic in video processing technologies and has important application value in consumer electronics. Currently, tracking-by-detection (TBD) is the dominant paradigm for MOT, which performs target detection and association frame by frame. However, the association performance of TBD methods degrades in complex scenes with heavy occlusions, which hinders the application of such methods in real-world scenarios.To this end, we incorporate pseudo-depth cues to enhance the association performance and propose Pseudo-Depth SORT (PD-SORT). First, we extend the Kalman filter state vector with pseudo-depth states. Second, we introduce a novel depth volume IoU (DVIoU) by combining the conventional 2D IoU with pseudo-depth. Furthermore, we develop a quantized pseudo-depth measurement (QPDM) strategy for more robust data association. Besides, we also integrate camera motion compensation (CMC) to handle dynamic camera situations. With the above designs, PD-SORT significantly alleviates the occlusion-induced ambiguous associations and achieves leading performances on DanceTrack, MOT17, and MOT20. Note that the improvement is especially obvious on DanceTrack, where objects show complex motions, similar appearances, and frequent occlusions. The code is available at https://github.com/Wangyc2000/PD_SORT.
Paper Structure (31 sections, 11 equations, 7 figures, 8 tables, 2 algorithms)

This paper contains 31 sections, 11 equations, 7 figures, 8 tables, 2 algorithms.

Figures (7)

  • Figure 1: Three examples of occlusion-induced tracking failures. The samples are OC-SORT results on DanceTrack, where objects have diverse motions and similar appearances.
  • Figure 2: A comparison of association without depth information and with depth information on DanceTrack sun2022dancetrack. Bounding boxes and dashed arrows of different colors represent the location and depth of different objects. we intuitively and experimentally observe that depth information can compensate for the association failure after occlusion and reappearance.
  • Figure 3: Pipeline of PD-SORT. The preparation stage estimates pseudo-depth for new detections and uses CMC to correct both motion states from KF and historical observations. For the motion cues generation, pseudo-depth is incorporated into motion states and bounding box locations for both tracklets and detections. The association stage utilizes the motion cues to compute pseudo-depth guided matching similarities in terms of DVIoU and QPDM, and the velocity consistency described by OCM to perform a two-stage association to match between tracklets and detections.
  • Figure 4: Illustration of our pseudo-depth. The orange double-arrow line represents the real depth on the ground plane ($depth$), the dashed orange double-arrow line represents the length that corresponds to the pseudo-depth in the complementary view on the ground plane ($depth_{complement}$), and the blue double-arrow line represents the pseudo-depth obtained by projecting the real depth onto the view plane with both the real image view and the complementary view ($pd$).
  • Figure 5: Illustration of IoU and DVIoU. By integrating pseudo-depth (the extra dimension represented by the dashed line in the figure), area-based standard 2D IoU is extended to volume-based DVIoU.
  • ...and 2 more figures