Hierarchical IoU Tracking based on Interval
Yunhao Du, Zhicheng Zhao, Fei Su
TL;DR
The paper addresses multi-object tracking by removing reliance on heavy appearance models and learning-based association, proposing HIT, a unified hierarchical IoU tracking framework that uses tracklet intervals as priors. HIT merges tracklets across multiple hierarchies using IoU-based association with Kalman motion, and introduces three consistency designs to counter inconsistencies in target size, camera movement, and hierarchical cues. The approach yields competitive results on MOT17, KITTI, DanceTrack, and VisDrone, and demonstrates versatile integration as a post-processing refinement for other trackers. Overall, HIT provides a simple yet effective baseline for offline tracking and post-processing, with potential for further gains through future learning-based enhancements.
Abstract
Multi-Object Tracking (MOT) aims to detect and associate all targets of given classes across frames. Current dominant solutions, e.g. ByteTrack and StrongSORT++, follow the hybrid pipeline, which first accomplish most of the associations in an online manner, and then refine the results using offline tricks such as interpolation and global link. While this paradigm offers flexibility in application, the disjoint design between the two stages results in suboptimal performance. In this paper, we propose the Hierarchical IoU Tracking framework, dubbed HIT, which achieves unified hierarchical tracking by utilizing tracklet intervals as priors. To ensure the conciseness, only IoU is utilized for association, while discarding the heavy appearance models, tricky auxiliary cues, and learning-based association modules. We further identify three inconsistency issues regarding target size, camera movement and hierarchical cues, and design corresponding solutions to guarantee the reliability of associations. Though its simplicity, our method achieves promising performance on four datasets, i.e., MOT17, KITTI, DanceTrack and VisDrone, providing a strong baseline for future tracking method design. Moreover, we experiment on seven trackers and prove that HIT can be seamlessly integrated with other solutions, whether they are motion-based, appearance-based or learning-based. Our codes will be released at https://github.com/dyhBUPT/HIT.
