PD-SORT: Occlusion-Robust Multi-Object Tracking Using Pseudo-Depth Cues
Yanchao Wang, Dawei Zhang, Run Li, Zhonglong Zheng, Minglu Li
TL;DR
PD-SORT addresses occlusion-induced identity errors in 2D MOT by integrating pseudo-depth into the Kalman filter as a state, and by designing Depth Volume IoU ($DVIoU$) and Quantized Pseudo-Depth Measurement ($QPDM$) to guide associations. It also uses Camera Motion Compensation ($CMC$) and an observation-centric recovery strategy, building on OC-SORT. Empirical results on DanceTrack, MOT17, and MOT20 show consistent gains over strong baselines, with notable improvements in occlusion-heavy sequences such as DanceTrack, while preserving online, real-time performance. This approach demonstrates the practicality of depth cues as a motion-state surrogate, enabling more robust tracking on consumer devices.
Abstract
Multi-object tracking (MOT) is a rising topic in video processing technologies and has important application value in consumer electronics. Currently, tracking-by-detection (TBD) is the dominant paradigm for MOT, which performs target detection and association frame by frame. However, the association performance of TBD methods degrades in complex scenes with heavy occlusions, which hinders the application of such methods in real-world scenarios.To this end, we incorporate pseudo-depth cues to enhance the association performance and propose Pseudo-Depth SORT (PD-SORT). First, we extend the Kalman filter state vector with pseudo-depth states. Second, we introduce a novel depth volume IoU (DVIoU) by combining the conventional 2D IoU with pseudo-depth. Furthermore, we develop a quantized pseudo-depth measurement (QPDM) strategy for more robust data association. Besides, we also integrate camera motion compensation (CMC) to handle dynamic camera situations. With the above designs, PD-SORT significantly alleviates the occlusion-induced ambiguous associations and achieves leading performances on DanceTrack, MOT17, and MOT20. Note that the improvement is especially obvious on DanceTrack, where objects show complex motions, similar appearances, and frequent occlusions. The code is available at https://github.com/Wangyc2000/PD_SORT.
