MOT20: A benchmark for multi object tracking in crowded scenes
Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixé
TL;DR
MOT20 extends the MOTChallenge benchmark with 8 densely crowded sequences across three scenes to stress-test multi-object tracking in challenging, realistic crowds. It establishes a standardized evaluation framework using CLEAR and track-quality metrics, with public Faster R-CNN detections and a consistent annotation/data-format protocol to separate target pedestrians from distractors. The paper details annotation rules, dataset characteristics, and the evaluation pipeline, including a Hungarian-based tracker-to-target assignment and IoU distance threshold, to enable fair cross-method comparisons. By emphasizing generalization to unseen scenes and crowded scenarios, MOT20 aims to push the development of more robust, crowd-capable tracking systems with practical impact in surveillance and analytics.
Abstract
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods. The challenge focuses on multiple people tracking, since pedestrians are well studied in the tracking community, and precise tracking and detection has high practical relevance. Since the first release, MOT15, MOT16, and MOT17 have tremendously contributed to the community by introducing a clean dataset and precise framework to benchmark multi-object trackers. In this paper, we present our MOT20benchmark, consisting of 8 new sequences depicting very crowded challenging scenes. The benchmark was presented first at the 4thBMTT MOT Challenge Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2019, and gives to chance to evaluate state-of-the-art methods for multiple object tracking when handling extremely crowded scenarios.
