MapTrack: Tracking in the Map
Fei Wang, Ruohui Zhang, Chenglin Chen, Min Yang, Yun Bai
TL;DR
MapTrack revisits the classic Tracking-by-Detection framework to address occlusion and crowding by introducing a probability map, a prediction map, and a Covariance Adaptive Kalman Filter. By prioritizing predictions in undetected or deformed detections and leveraging global constraints through the maps, it achieves robust, online MOT performance on MOT17 and MOT20. The approach delivers state-of-the-art results in key association metrics (e.g., HOTA, IDF1) while maintaining real-time operation and compatibility with lightweight detectors and ReID models. Its design offers a strong, deployable baseline that can be integrated with other data-association strategies to improve reliability in challenging scenes.
Abstract
Multi-Object Tracking (MOT) aims to maintain stable and uninterrupted trajectories for each target. Most state-of-the-art approaches first detect objects in each frame and then implement data association between new detections and existing tracks using motion models and appearance similarities. Despite achieving satisfactory results, occlusion and crowds can easily lead to missing and distorted detections, followed by missing and false associations. In this paper, we first revisit the classic tracker DeepSORT, enhancing its robustness over crowds and occlusion significantly by placing greater trust in predictions when detections are unavailable or of low quality in crowded and occluded scenes. Specifically, we propose a new framework comprising of three lightweight and plug-and-play algorithms: the probability map, the prediction map, and the covariance adaptive Kalman filter. The probability map identifies whether undetected objects have genuinely disappeared from view (e.g., out of the image or entered a building) or are only temporarily undetected due to occlusion or other reasons. Trajectories of undetected targets that are still within the probability map are extended by state estimations directly. The prediction map determines whether an object is in a crowd, and we prioritize state estimations over observations when severe deformation of observations occurs, accomplished through the covariance adaptive Kalman filter. The proposed method, named MapTrack, achieves state-of-the-art results on popular multi-object tracking benchmarks such as MOT17 and MOT20. Despite its superior performance, our method remains simple, online, and real-time. The code will be open-sourced later.
